Sample-Efficient Linear Regression with Self-Selection Bias (2402.14229v1)

Published 22 Feb 2024 in math.ST, cs.DS, cs.LG, and stat.TH

Abstract: We consider the problem of linear regression with self-selection bias in the unknown-index setting, as introduced in recent work by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [STOC 2023]. In this model, one observes $m$ i.i.d. samples $(\mathbf{x}{\ell},z{\ell}){\ell=1}^m$ where $z{\ell}=\max_{i\in [k]}{\mathbf{x}{\ell}^{T\mathbf{w}_i+\eta}{i,\ell}}$, but the maximizing index $i_{\ell}$ is unobserved. Here, the $\mathbf{x}{\ell}$ are assumed to be $\mathcal{N}(0,I_n)$ and the noise distribution $\mathbf{\eta}{\ell}\sim \mathcal{D}$ is centered and independent of $\mathbf{x}_{\ell}$. We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $\mathbf{w}_1,\ldots,\mathbf{w}_k\in \mathbb{R}^n$ up to additive $\ell_2$-error $\varepsilon$ with polynomial sample complexity $\tilde{O}(n)\cdot \mathsf{poly}(k,1/\varepsilon)$ and significantly improved time complexity $\mathsf{poly}(n,k,1/\varepsilon)+O(\log(k)/\varepsilon)^{O(k)}$. When $k=O(1)$, our algorithm runs in $\mathsf{poly}(n,1/\varepsilon)$ time, generalizing the polynomial guarantee of an explicit moment matching algorithm of Cherapanamjeri, et al. for $k=2$ and when it is known that $\mathcal{D}=\mathcal{N}(0,I_k)$. Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression where the added noise is taken outside the maximum. For this problem, our algorithm is efficient in a much larger range of $k$ than the state-of-the-art due to Ghosh, Pananjady, Guntuboyina, and Ramchandran [IEEE Trans. Inf. Theory 2022] for not too small $\varepsilon$, and leads to improved algorithms for any $\varepsilon$ by providing a warm start for existing local convergence methods.

References (32)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Sample-Efficient Linear Regression with Self-Selection Bias (2402.14229v1)

Summary

Related Papers