Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Interpretable Approaches to Cluster, Classify and Represent Multi-Subspace Data via Minimum Lossy Coding Length based on Rate-Distortion Theory (2302.10383v1)

Published 21 Feb 2023 in cs.CV

Abstract: To cluster, classify and represent are three fundamental objectives of learning from high-dimensional data with intrinsic structure. To this end, this paper introduces three interpretable approaches, i.e., segmentation (clustering) via the Minimum Lossy Coding Length criterion, classification via the Minimum Incremental Coding Length criterion and representation via the Maximal Coding Rate Reduction criterion. These are derived based on the lossy data coding and compression framework from the principle of rate distortion in information theory. These algorithms are particularly suitable for dealing with finite-sample data (allowed to be sparse or almost degenerate) of mixed Gaussian distributions or subspaces. The theoretical value and attractive features of these methods are summarized by comparison with other learning methods or evaluation criteria. This summary note aims to provide a theoretical guide to researchers (also engineers) interested in understanding 'white-box' machine (deep) learning methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Efficient Maximal Coding Rate Reduction by Variational Forms, . http://arxiv.org/abs/2204.00077
  2. The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory, 44(6):2743-2760. https://doi.org/10.1109/18.720554
  3. Regularized estimation of large covariance matrices. The Annals of Statistics, 36(1):2006. https://doi.org/10.1214/009053607000000758
  4. Bishop C, 2006. Pattern Recognition and Machine Learning. Springer, p.758. https://doi.org/10.5555/1162264
  5. Redunet: A white-box deep network from the principle of maximizing rate reduction. arXiv preprint arXiv:210510446, .
  6. Group equivariant convolutional networks. International Conference on Machine Learning, p.2990-2999.
  7. A general theory of equivariant cnns on homogeneous spaces. Advances in Neural Information Processing Systems, p.9142-9153.
  8. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Blackwell.
  9. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39(1):1-38. http://www.jstor.org/stable/2984875
  10. Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices. Proceedings of the 2003 American Control Conference, 3:2156-2162.
  11. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):381-396.
  12. Forgy EW, 1965. Cluster analysis of multivariate data : efficiency versus interpretability of classifications. Biometrics, 21:768-769.
  13. Generative adversarial nets. Advances in Neural Information Processing Systems, p.2672-2680.
  14. Deep Learning. MIT Press.
  15. Dimensionality reduction by learning an invariant mapping. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006, p.1735-1742.
  16. Model selection and the principle of minimum description length. Journal of the American Statistical Association, 96(454):746-774. https://doi.org/10.1198/016214501753168398
  17. The Elements of Statistical Learning. Springer.
  18. Momentum contrast for unsupervised visual representation learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p.9726-9735. https://doi.org/10.1109/CVPR42600.2020.00975
  19. Learning deep representations by mutual information estimation and maximization. International Conference on Learning Representations, .
  20. Clustering appearances of objects under varying illumination conditions. 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003 Proceedings, 1:I-I. https://doi.org/10.1109/CVPR.2003.1211332
  21. Combining generative models and fisher kernels for object recognition. Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, 1:136-143 Vol. 1. https://doi.org/10.1109/ICCV.2005.56
  22. How generative adversarial networks and their variants work: An overview. ACM Computing Surveys, 52(1):1-43.
  23. Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, p.448–456.
  24. Jancey RC, 1966. Multidimensional group analysis. Australian Journal of Botany, 14:127-130.
  25. Jolliffe IT, 2002. Principal Component Analysis. Springer-Verlag.
  26. Logdet rank minimization with application to subspace clustering. Computational Intelligence and Neuroscience, 2015. https://doi.org/10.1155/2015/824289
  27. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324. https://doi.org/10.1109/5.726791
  28. Acquiring linear subspaces for face recognition under variable lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5):684-698. https://doi.org/10.1109/TPAMI.2005.92
  29. OLE: Orthogonal low-rank embedding-a plug and play geometric loss for deep learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p.8109-8118.
  30. Multimodal image synthesis with conditional implicit maximum likelihood estimation. International Journal of Computer Vision, 128(10–11):2607–2628.
  31. Neural Manifold Clustering and Embedding, :1-18. http://arxiv.org/abs/2201.10000
  32. Lloyd S, 1982. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129-137. https://doi.org/10.1109/TIT.1982.1056489
  33. Segmentation of multivariate mixed data via lossy data coding and compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(09):1546-1562. https://doi.org/10.1109/TPAMI.2007.1085
  34. A rate-distortion framework for explaining neural network decisions. CoRR, abs/1905.11092. http://arxiv.org/abs/1905.11092
  35. Macqueen J, 1967. Some methods for classification and analysis of multivariate observations. In 5-th Berkeley Symposium on Mathematical Statistics and Probability, p.281-297.
  36. Minimum description length vs. maximum likelihood in lossy data compression. IEEE International Symposium on Information Theory, (04):1-6.
  37. Moon T, 1996. The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6):47-60. https://doi.org/10.1109/79.543975
  38. Representation learning with contrastive predictive coding. arXiv preprint arXiv:180703748, .
  39. Deep isometric learning for visual recognition. Proceedings of the International Conference on International Conference on Machine Learning.
  40. Contractive auto-encoders: Explicit invariance during feature extraction. In International Conference on Machine Learning, p.833–840.
  41. Rissanen J, 1978. Modeling by shortest data description. Automatica, 14(5):465-471. https://doi.org/10.1016/0005-1098(78)90005-5
  42. Rose K, 1994. A mapping approach to rate-distortion computation and analysis. IEEE Transactions on Information Theory, 40(6):1939-1952. https://doi.org/10.1109/18.340468
  43. Rose K, 1998. Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of the IEEE, 86(11):2210-2239. https://doi.org/10.1109/5.726788
  44. Deep learning and the information bottleneck principle. 2015 IEEE Information Theory Workshop (ITW), p.1-5.
  45. Incremental Learning of Structured Memory via Closed-Loop Transcription, . http://arxiv.org/abs/2202.05411
  46. Smem algorithm for mixture models. Neural Comput, 12(9):2109–2128.
  47. Vapnik VN, 1995. The Nature of Statistical Learning Theory. Springer-Verlag, Berlin, Heidelberg.
  48. Generalized principal component analysis (gpca). IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12):1945-1959.
  49. Generalized Principal Component Analysis. Springer Publishing Company, Incorporated.
  50. A unified framework for subspace face recognition. IEEE Trans Pattern Anal Mach Intell, 26(9):1222–1228. https://doi.org/10.1109/TPAMI.2004.57 https://doi.org/10.1109/TPAMI.2004.57
  51. Generative adversarial networks in computer vision: A survey and taxonomy. ACM Computing Surveys, 54(2):1-41.
  52. Classification via minimum incremental coding length (micl). Proceedings of the 20th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, p.1633–1640.
  53. Distance metric learning, with application to clustering with side-information. Proceedings of the 15th International Conference on Neural Information Processing Systems, Cambridge, MA, USA, p.521–528.
  54. Learning diverse and discriminative representations via the principle of maximal coding rate reduction. Advances in Neural Information Processing Systems, 33:9422-9434.
  55. Understanding deep learning requires rethinking generalization. International Conference on Learning Representations.
  56. Learning a self-expressive network for subspace clustering. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, :12388-12398. https://doi.org/10.1109/CVPR46437.2021.01221
  57. Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15:199-227.

Summary

We haven't generated a summary for this paper yet.