Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sparse Linear Regression and Lattice Problems (2402.14645v1)

Published 22 Feb 2024 in cs.LG and stat.ML

Abstract: Sparse linear regression (SLR) is a well-studied problem in statistics where one is given a design matrix $X\in\mathbb{R}{m\times n}$ and a response vector $y=X\theta*+w$ for a $k$-sparse vector $\theta*$ (that is, $|\theta*|_0\leq k$) and small, arbitrary noise $w$, and the goal is to find a $k$-sparse $\widehat{\theta} \in \mathbb{R}n$ that minimizes the mean squared prediction error $\frac{1}{m}|X\widehat{\theta}-X\theta*|2_2$. While $\ell_1$-relaxation methods such as basis pursuit, Lasso, and the Dantzig selector solve SLR when the design matrix is well-conditioned, no general algorithm is known, nor is there any formal evidence of hardness in an average-case setting with respect to all efficient algorithms. We give evidence of average-case hardness of SLR w.r.t. all efficient algorithms assuming the worst-case hardness of lattice problems. Specifically, we give an instance-by-instance reduction from a variant of the bounded distance decoding (BDD) problem on lattices to SLR, where the condition number of the lattice basis that defines the BDD instance is directly related to the restricted eigenvalue condition of the design matrix, which characterizes some of the classical statistical-computational gaps for sparse linear regression. Also, by appealing to worst-case to average-case reductions from the world of lattices, this shows hardness for a distribution of SLR instances; while the design matrices are ill-conditioned, the resulting SLR instances are in the identifiable regime. Furthermore, for well-conditioned (essentially) isotropic Gaussian design matrices, where Lasso is known to behave well in the identifiable regime, we show hardness of outputting any good solution in the unidentifiable regime where there are many solutions, assuming the worst-case hardness of standard and well-studied lattice problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. László Babai. On lovász’lattice reduction and the nearest lattice point problem. Combinatorica, 6:1–13, 1986.
  2. Computational-statistical gaps for improper learning in sparse linear regression, 2024. Personal Communication.
  3. Noise-tolerant learning, the parity problem, and the statistical query model. J. ACM, 50(4):506–519, 2003.
  4. Classical hardness of learning with errors. In Dan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors, Symposium on Theory of Computing Conference, STOC’13, Palo Alto, CA, USA, June 1-4, 2013, pages 575–584. ACM, 2013.
  5. Continuous lwe. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 694–707, 2021.
  6. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput., 20(1):33–61, 1998.
  7. The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 35(6):2313 – 2351, 2007.
  8. Robustness of the learning with errors assumption. In Andrew Chi-Chih Yao, editor, Innovations in Computer Science - ICS 2010, Tsinghua University, Beijing, China, January 5-7, 2010. Proceedings, pages 230–240. Tsinghua University Press, 2010.
  9. Inference in high-dimensional linear regression via lattice basis reduction and integer relation detection. IEEE Trans. Inf. Theory, 67(12):8109–8139, 2021.
  10. The fine-grained hardness of sparse linear regression. arXiv preprint arXiv:2106.03131, 2021.
  11. Continuous lwe is as hard as lwe & applications to learning gaussian mixtures. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 1162–1173. IEEE, 2022.
  12. An improved BKW algorithm for LWE with applications to cryptography and lattices. In Rosario Gennaro and Matthew Robshaw, editors, Advances in Cryptology - CRYPTO 2015 - 35th Annual Cryptology Conference, Santa Barbara, CA, USA, August 16-20, 2015, Proceedings, Part I, volume 9215 of Lecture Notes in Computer Science, pages 43–62. Springer, 2015.
  13. On the power of preconditioning in sparse linear regression. CoRR, abs/2106.09207, 2021.
  14. Lower bounds on randomly preconditioned lasso via robust sparse designs. Advances in Neural Information Processing Systems, 35:24419–24431, 2022.
  15. Feature adaptation for sparse linear regression. CoRR, abs/2305.16892, 2023.
  16. Factoring polynomials with rational coefficients. Mathematische annalen, 261(ARTICLE):515–534, 1982.
  17. On bounded distance decoding for general lattices. In Josep Díaz, Klaus Jansen, José D. P. Rolim, and Uri Zwick, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 and 10th International Workshop on Randomization and Computation, RANDOM 2006, Barcelona, Spain, August 28-30 2006, Proceedings, volume 4110 of Lecture Notes in Computer Science, pages 450–461. Springer, 2006.
  18. Adaptive estimation of a quadratic functional by model selection. Annals of Statistics, pages 1302–1338, 2000.
  19. Daniele Micciancio. On the hardness of learning with errors with binary secrets. Theory of Computing, 14(1):1–17, 2018.
  20. Trapdoors for lattices: Simpler, tighter, faster, smaller. In David Pointcheval and Thomas Johansson, editors, Advances in Cryptology - EUROCRYPT 2012 - 31st Annual International Conference on the Theory and Applications of Cryptographic Techniques, Cambridge, UK, April 15-19, 2012. Proceedings, volume 7237 of Lecture Notes in Computer Science, pages 700–718. Springer, 2012.
  21. NIST. Post-quantum cryptography standardization. https://csrc.nist.gov/Projects/Post-Quantum-Cryptography.
  22. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Statistical Science, pages 538–557, 2012.
  23. Chris Peikert. Public-key cryptosystems from the worst-case shortest vector problem: extended abstract. In Michael Mitzenmacher, editor, Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 333–342. ACM, 2009.
  24. Oded Regev. On lattices, learning with errors, random linear codes, and cryptography. J. ACM, 56(6):34:1–34:40, 2009.
  25. Non-asymptotic theory of random matrices: extreme singular values. In Proceedings of the International Congress of Mathematicians 2010 (ICM 2010) (In 4 Volumes) Vol. I: Plenary Lectures and Ceremonies Vols. II–IV: Invited Lectures, pages 1576–1602. World Scientific, 2010.
  26. Claus-Peter Schnorr. A hierarchy of polynomial time lattice basis reduction algorithms. Theor. Comput. Sci., 53:201–224, 1987.
  27. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
  28. Lattice-based methods surpass sum-of-squares in clustering. In Po-Ling Loh and Maxim Raginsky, editors, Conference on Learning Theory, 2-5 July 2022, London, UK, volume 178 of Proceedings of Machine Learning Research, pages 1247–1248. PMLR, 2022.
  29. Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. In Conference on Learning Theory, pages 921–948, 2014.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets