Locality Regularized Reconstruction: Structured Sparsity and Delaunay Triangulations (2405.00837v1)
Abstract: Linear representation learning is widely studied due to its conceptual simplicity and empirical utility in tasks such as compression, classification, and feature extraction. Given a set of points $[\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n] = \mathbf{X} \in \mathbb{R}{d \times n}$ and a vector $\mathbf{y} \in \mathbb{R}d$, the goal is to find coefficients $\mathbf{w} \in \mathbb{R}n$ so that $\mathbf{X} \mathbf{w} \approx \mathbf{y}$, subject to some desired structure on $\mathbf{w}$. In this work we seek $\mathbf{w}$ that forms a local reconstruction of $\mathbf{y}$ by solving a regularized least squares regression problem. We obtain local solutions through a locality function that promotes the use of columns of $\mathbf{X}$ that are close to $\mathbf{y}$ when used as a regularization term. We prove that, for all levels of regularization and under a mild condition that the columns of $\mathbf{X}$ have a unique Delaunay triangulation, the optimal coefficients' number of non-zero entries is upper bounded by $d+1$, thereby providing local sparse solutions when $d \ll n$. Under the same condition we also show that for any $\mathbf{y}$ contained in the convex hull of $\mathbf{X}$ there exists a regime of regularization parameter such that the optimal coefficients are supported on the vertices of the Delaunay simplex containing $\mathbf{y}$. This provides an interpretation of the sparsity as having structure obtained implicitly from the Delaunay triangulation of $\mathbf{X}$. We demonstrate that our locality regularized problem can be solved in comparable time to other methods that identify the containing Delaunay simplex.
- M. W. Libbrecht and W. S. Noble, “Machine learning applications in genetics and genomics,” Nature Reviews Genetics, vol. 16, no. 6, pp. 321–332, 2015.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- H. Hotelling, “Analysis of a complex of statistical variables into principal components.,” Journal of Educational Psychology, vol. 24, no. 6, p. 417, 1933.
- A. Hyvärinen and E. Oja, “Independent component analysis: algorithms and applications,” Neural networks, vol. 13, no. 4-5, pp. 411–430, 2000.
- E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of the ACM, vol. 58, no. 3, pp. 1–37, 2011.
- B. Schölkopf, A. Smola, and K.-R. Müller, “Kernel principal component analysis,” in International Conference on Artificial Neural Networks, pp. 583–588, Springer, 1997.
- J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
- S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000.
- M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Computation, vol. 15, no. 6, pp. 1373–1396, 2003.
- R. R. Coifman and S. Lafon, “Diffusion maps,” Applied and Computational Harmonic Analysis, vol. 21, no. 1, pp. 5–30, 2006.
- N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 1, pp. 886–893, Ieee, 2005.
- D. G. Lowe, “Object recognition from local scale-invariant features,” in IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157, 1999.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, vol. 25, 2012.
- S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, et al., “CNN architectures for large-scale audio classification,” in IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 131–135, 2017.
- T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, et al., “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp. 38–45, 2020.
- T. T. Cai and L. Wang, “Orthogonal matching pursuit for sparse signal recovery with noise,” IEEE Transactions on Information Theory, vol. 57, no. 7, pp. 4680–4688, 2011.
- D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301–321, 2009.
- E. J. Candes and T. Tao, “Near-optimal signal recovery from random projections: Universal encoding strategies?,” IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5406–5425, 2006.
- E. J. Candes, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, 2006.
- R. Chartrand, “Exact reconstruction of sparse signals via nonconvex minimization,” IEEE Signal Processing Letters, vol. 14, no. 10, pp. 707–710, 2007.
- S. Foucart and M.-J. Lai, “Sparsest solutions of underdetermined linear systems via lqsubscript𝑙𝑞l_{q}italic_l start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-minimization for 0<q≤10𝑞10<q\leq 10 < italic_q ≤ 1,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 395–407, 2009.
- S. Huang and T. D. Tran, “Sparse signal recovery via generalized entropy functions minimization,” IEEE Transactions on Signal Processing, vol. 67, no. 5, pp. 1322–1337, 2018.
- M. A. Khajehnejad, W. Xu, A. S. Avestimehr, and B. Hassibi, “Weighted l𝑙litalic_l1 minimization for sparse recovery with prior information,” in IEEE International Symposium on Information Theory, pp. 483–487, IEEE, 2009.
- N. Vaswani and W. Lu, “Modified-cs: Modifying compressive sensing for problems with partially known support,” IEEE Transactions on Signal Processing, vol. 58, no. 9, pp. 4595–4607, 2010.
- J. Huang and T. Zhang, “The benefit of group sparsity,” Annals of Statistics, vol. 38, no. 1, pp. 1978–2004, 2010.
- M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 68, no. 1, pp. 49–67, 2006.
- P. Sprechmann, I. Ramirez, G. Sapiro, and Y. Eldar, “Collaborative hierarchical sparse modeling,” in 2010 44th Annual Conference on Information Sciences and Systems (CISS), pp. 1–6, IEEE, 2010.
- S. Foucart, “Recovering jointly sparse vectors via hard thresholding pursuit,” in Sampling Theory and Applications, 2011.
- E. Elhamifar and R. Vidal, “Sparse manifold clustering and embedding,” in Advances in Neural Information Processing Systems, vol. 24, pp. 55–63, 2011.
- B. Delaunay, “Sur la sphere vide,” Izv. Akad. Nauk SSSR, Otdelenie Matematicheskii i Estestvennyka Nauk, vol. 7, no. 793-800, pp. 1–2, 1934.
- Boca Raton: CRC Press, 2016.
- N. P. Weatherill, “Delaunay triangulation in computational fluid dynamics,” Computers & Mathematics with Applications, vol. 24, no. 5-6, pp. 129–150, 1992.
- A. Gillette and E. Kur, “Data-driven geometric scale detection via Delaunay interpolation,” arXiv preprint arXiv:2203.05685, 2022.
- S. M. Omohundro, “The delaunay triangulation and function learning,” tech. rep., International Computer Science Institute, 1989.
- L. Chen and J.-c. Xu, “Optimal delaunay triangulations,” Journal of Computational Mathematics, pp. 299–308, 2004.
- A. Tasissa, P. Tankala, J. M. Murphy, and D. Ba, “K-deep simplex: Manifold learning via local dictionaries,” IEEE Transactions on Signal Processing, vol. 71, pp. 3741–3754, 2023.
- A. Beck, First-order methods in optimization. Philadelphia: SIAM, 2017.
- T. H. Chang, L. T. Watson, T. C. Lux, A. R. Butt, K. W. Cameron, and Y. Hong, “Delaunaysparse: Interpolation via a sparse subset of the Delaunay triangulation in medium to high dimensions,” ACM Transactions on Mathematical Software, vol. 46, no. 4, pp. 1–20, 2020.
- G. Davis, S. Mallat, and M. Avellaneda, “Adaptive greedy approximations,” Constructive Approximation, vol. 13, no. 1, pp. 57–98, 1997.
- Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” in Proceedings of 27th Asilomar conference on signals, systems and computers, pp. 40–44, IEEE, 1993.
- J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Transactions on Information Theory, vol. 53, no. 12, pp. 4655–4666, 2007.
- J. A. Tropp, “Greed is good: Algorithmic results for sparse approximation,” IEEE Transactions on Information Theory, vol. 50, no. 10, pp. 2231–2242, 2004.
- S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM review, vol. 43, no. 1, pp. 129–159, 2001.
- S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing. Applied and Numerical Harmonic Analysis, New York: Springer, 2013.
- L. Lian, A. Liu, and V. K. Lau, “Weighted lasso for sparse recovery with statistical prior support information,” IEEE Transactions on Signal Processing, vol. 66, no. 6, pp. 1607–1618, 2018.
- H. Mansour and R. Saab, “Recovery analysis for weighted l1-minimization using the null space property,” Applied and Computational Harmonic Analysis, vol. 43, no. 1, pp. 23–38, 2017.
- J. Ho, Y. Xie, and B. Vemuri, “On a nonlinear generalization of sparse coding and dictionary learning,” in International conference on machine learning, pp. 1480–1488, PMLR, 2013.
- M. Werenski, R. Jiang, A. Tasissa, S. Aeron, and J. M. Murphy, “Measure estimation in the barycentric coding model,” in International Conference on Machine Learning, pp. 23781–23803, PMLR, 2022.
- M.-H. Do, J. Feydy, and O. Mula, “Approximation and structured prediction with sparse wasserstein barycenters,” arXiv preprint arXiv:2302.05356, 2023.
- M. Mueller, S. Aeron, J. M. Murphy, and A. Tasissa, “Geometrically regularized wasserstein dictionary learning,” in Topological, Algebraic and Geometric Learning Workshops 2023, pp. 384–403, PMLR, 2023.
- H. Edelsbrunner and R. Seidel, “Voronoi diagrams and arrangements,” in Proceedings of the first annual symposium on Computational geometry, pp. 251–262, 1985.
- K. Fukuda et al., “Frequently asked questions in polyhedral computation,” 2004.
- H. Edelsbrunner, “An acyclicity theorem for cell complexes in d dimensions,” in Proceedings of the fifth annual symposium on Computational geometry, pp. 145–151, 1989.
- Philadelphia: SIAM, 2014.
- M. S. Andersen, J. Dahl, and L. Vandenberghe, “Cvxopt: a python package for convex optimization, version 1.3,” 2013.
- M. R. Osborne, B. Presnell, and B. A. Turlach, “A new approach to variable selection in least squares problems,” IMA journal of numerical analysis, vol. 20, no. 3, pp. 389–403, 2000.
- R. J. Tibshirani and J. Taylor, “The solution path of the generalized lasso,” The Annals of Statistics, vol. 39, no. 3, p. 1335, 2011.
- B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” The Annals of Statistics, vol. 32, no. 2, pp. 407–499, 2004.
- D. L. Donoho and Y. Tsaig, “Fast solution of ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT -norm minimization problems when the solution may be sparse,” IEEE Transactions on Information Theory, vol. 54, no. 11, pp. 4789–4812, 2008.
- B. Gu and V. S. Sheng, “A solution path algorithm for general parametric quadratic programming problem,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 9, pp. 4462–4472, 2017.
- B. Gu and C. Ling, “A new generalized error path algorithm for model selection,” in International conference on machine learning, pp. 2549–2558, PMLR, 2015.