Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sample-Efficient Linear Regression with Self-Selection Bias (2402.14229v1)

Published 22 Feb 2024 in math.ST, cs.DS, cs.LG, and stat.TH

Abstract: We consider the problem of linear regression with self-selection bias in the unknown-index setting, as introduced in recent work by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [STOC 2023]. In this model, one observes $m$ i.i.d. samples $(\mathbf{x}{\ell},z{\ell}){\ell=1}m$ where $z{\ell}=\max_{i\in [k]}{\mathbf{x}{\ell}T\mathbf{w}_i+\eta{i,\ell}}$, but the maximizing index $i_{\ell}$ is unobserved. Here, the $\mathbf{x}{\ell}$ are assumed to be $\mathcal{N}(0,I_n)$ and the noise distribution $\mathbf{\eta}{\ell}\sim \mathcal{D}$ is centered and independent of $\mathbf{x}_{\ell}$. We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $\mathbf{w}_1,\ldots,\mathbf{w}_k\in \mathbb{R}n$ up to additive $\ell_2$-error $\varepsilon$ with polynomial sample complexity $\tilde{O}(n)\cdot \mathsf{poly}(k,1/\varepsilon)$ and significantly improved time complexity $\mathsf{poly}(n,k,1/\varepsilon)+O(\log(k)/\varepsilon){O(k)}$. When $k=O(1)$, our algorithm runs in $\mathsf{poly}(n,1/\varepsilon)$ time, generalizing the polynomial guarantee of an explicit moment matching algorithm of Cherapanamjeri, et al. for $k=2$ and when it is known that $\mathcal{D}=\mathcal{N}(0,I_k)$. Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression where the added noise is taken outside the maximum. For this problem, our algorithm is efficient in a much larger range of $k$ than the state-of-the-art due to Ghosh, Pananjady, Guntuboyina, and Ramchandran [IEEE Trans. Inf. Theory 2022] for not too small $\varepsilon$, and leads to improved algorithms for any $\varepsilon$ by providing a warm start for existing local convergence methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Takeshi Amemiya. A Note on a Fair and Jaffee Model. Econometrica, 42(4):759–762, 1974. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/1913944.
  2. Identification of standard auction models. Econometrica, 70(6):2107–2140, 2002. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/3081982.
  3. Gábor Balázs. Convex Regression: Theory, Practice, and Applications. PhD Thesis, University of Alberta, 2016.
  4. Rajendra Bhatia. Matrix Analysis. Graduate Texts in Mathematics. Springer, 1997. ISBN 9783540948469. URL https://books.google.com/books?id=f0ioPwAACAAJ.
  5. Learnability and the Vapnik-Chervonenkis dimension. J. ACM, 36(4):929–965, 1989. ISSN 0004-5411. doi: 10.1145/76359.76371. URL https://doi.org/10.1145/76359.76371.
  6. Learning mixtures of linear regressions in subexponential time via fourier moments. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 587–600. ACM, 2020. doi: 10.1145/3357713.3384333. URL https://doi.org/10.1145/3357713.3384333.
  7. Estimation of standard auction models. In David M. Pennock, Ilya Segal, and Sven Seuken, editors, EC ’22: The 23rd ACM Conference on Economics and Computation, Boulder, CO, USA, July 11 - 15, 2022, pages 602–603. ACM, 2022. doi: 10.1145/3490486.3538284. URL https://doi.org/10.1145/3490486.3538284.
  8. What Makes a Good Fisherman? Linear Regression under Self-Selection Bias. In Barna Saha and Rocco A. Servedio, editors, Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, pages 1699–1712. ACM, 2023. doi: 10.1145/3564246.3585177. URL https://doi.org/10.1145/3564246.3585177.
  9. Is your function low dimensional? In Alina Beygelzimer and Daniel Hsu, editors, Conference on Learning Theory, COLT 2019, 25-28 June 2019, Phoenix, AZ, USA, volume 99 of Proceedings of Machine Learning Research, pages 979–993. PMLR, 2019a. URL http://proceedings.mlr.press/v99/de19a.html.
  10. Junta Correlation is Testable. In David Zuckerman, editor, 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 1549–1563. IEEE Computer Society, 2019b. doi: 10.1109/FOCS.2019.00090. URL https://doi.org/10.1109/FOCS.2019.00090.
  11. Small covers for near-zero sets of polynomials and learning latent variable models. In Sandy Irani, editor, 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 184–195. IEEE, 2020. doi: 10.1109/FOCS46700.2020.00026. URL https://doi.org/10.1109/FOCS46700.2020.00026.
  12. Statistical Query Lower Bounds for List-Decodable Linear Regression. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 3191–3204, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/19b1b73d63d4c9ea79f8ca57e9d67095-Abstract.html.
  13. Agnostically learning multi-index models with queries. CoRR, abs/2312.16616, 2023. doi: 10.48550/ARXIV.2312.16616. URL https://doi.org/10.48550/arXiv.2312.16616.
  14. Methods of Estimation for Markets in Disequilibrium. Econometrica, 40(3):497–514, 1972. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/1913181.
  15. Max-affine regression: Parameter estimation for gaussian designs. IEEE Transactions on Information Theory, 68(3):1851–1885, 2022. doi: 10.1109/TIT.2021.3130717.
  16. Multivariate Convex Regression with Adaptive Partitioning. Journal of Machine Learning Research, 14(102):3261–3294, 2013. URL http://jmlr.org/papers/v14/hannah13a.html.
  17. James J. Heckman. Sample selection bias as a specification error. Econometrica, 47(1):153–161, 1979. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/1912352.
  18. James J. Heckman. Selection Bias and Self-Selection, pages 1–18. Palgrave Macmillan UK, London, 2017. ISBN 978-1-349-95121-5. doi: 10.1057/978-1-349-95121-5_1762-2. URL https://doi.org/10.1057/978-1-349-95121-5_1762-2.
  19. List-decodable Linear Regression. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 7423–7432, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/7f5fc754c7af0a6370c9bf91314e79f4-Abstract.html.
  20. Max-affine regression via first-order methods. CoRR, abs/2308.08070, 2023. doi: 10.48550/ARXIV.2308.08070. URL https://doi.org/10.48550/arXiv.2308.08070.
  21. Max-Linear Regression by Convex Programming. IEEE Transactions on Information Theory, pages 1–1, 2024. doi: 10.1109/TIT.2024.3350518.
  22. EM converges for a mixture of many linear regressions. In Silvia Chiappa and Roberto Calandra, editors, The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], volume 108 of Proceedings of Machine Learning Research, pages 1727–1736. PMLR, 2020. URL http://proceedings.mlr.press/v108/kwon20a.html.
  23. Lung-fei Lee. Self-Selection. A Companion to Theoretical Econometrics, pages 383–409, 2001.
  24. Convex piecewise-linear fitting. Optimization and Engineering, 10:1–17, 2009.
  25. On learning mixture of linear regressions in the non-realizable setting. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 17202–17220. PMLR, 2022. URL https://proceedings.mlr.press/v162/pal22b.html.
  26. List Decodable Learning via Sum of Squares. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 161–180. SIAM, 2020. doi: 10.1137/1.9781611975994.10. URL https://doi.org/10.1137/1.9781611975994.10.
  27. A. D. Roy. Some Thoughts on the Distribution of Earnings. Oxford Economic Papers, 3(2):135–146, 06 1951. ISSN 0030-7653. doi: 10.1093/oxfordjournals.oep.a041827. URL https://doi.org/10.1093/oxfordjournals.oep.a041827.
  28. Ramon van Handel. Probability in High Dimension. APC 550 Lecture Notes, Princeton University, 2016.
  29. Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Number 47 in Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. ISBN 978-1-108-41519-4.
  30. Martin J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019.
  31. Education and Self-Selection. Working Paper 249, National Bureau of Economic Research, June 1978. URL http://www.nber.org/papers/w0249.
  32. Solving a mixture of many random linear equations by tensor decomposition and alternating minimization. CoRR, abs/1608.05749, 2016. URL http://arxiv.org/abs/1608.05749.

Summary

We haven't generated a summary for this paper yet.