Noiseless Linear Regression with Gaussian Covariates

Updated 19 October 2025

Noiseless linear regression under Gaussian covariates is defined by an exact linear relationship that enables perfect recovery of the unknown regressor when n ≥ d.
Algorithmic approaches such as SVD, row sampling, and lattice basis reduction address recovery challenges, including NP-hard issues when correspondence is unknown.
The study establishes rigorous signal-to-noise ratio bounds and sample complexity thresholds, highlighting key computational-statistical tradeoffs especially under contamination.

Noiseless linear regression under Gaussian covariates is the paper of statistical and computational properties of linear regression models in absence of additive noise—where responses are exact linear functions of Gaussian-distributed covariates. This regime is theoretically attractive, allowing perfect recovery under ideal conditions, and serves as a testbed to probe algorithmic hardness, information-computation gaps, and innovations in estimator and optimization methods.

1. Model Formulation and Fundamental Properties

The classical noiseless linear model specifies the relation

$y_i = x_i^\top w_*$

where $x_i \sim \mathcal{N}(0, I_d)$ are i.i.d. covariates, $w_*\in\mathbb{R}^d$ is the unknown regressor, and there is no additive noise. In cases of correspondence uncertainty (records of $(x_i, y_i)$ are scrambled), the relationship is expressed as

$y_i = w_*^\top x_{\pi(i)}$

for an unknown permutation $\pi$ .

Key aspects:

Covariate matrix $X \in \mathbb{R}^{n \times d}$ is standard multivariate Gaussian.
Responses $y \in \mathbb{R}^n$ are exact linear images of $X$ via $w_*$ (no stochastic error).
Higher-order questions arise when labels $y$ are contaminated (see Section 5).

Fundamental consequence: With $n\geq d$ and $X$ full rank, $w_*$ can generally be perfectly reconstructed given known correspondence.

2. Algorithmic Approaches for Recovery with Unknown Correspondence

When the matching between $x_i$ and $y_i$ is lost, regression without correspondence is NP-hard in its exact form (by reduction from 3-Partition).

For constant dimensions ( $d$ fixed), a fully polynomial-time approximation scheme (FPTAS) (Hsu et al., 2017) exists:

Singular value decomposition reduces $X$ to $U \in \mathbb{R}^{n \times k}$ , $U^\top U = I_k$ .
Row sampling (Boutsidis et al.) selects $O(k)$ rows yielding combinatorially defined sets $\mathcal{B}$ of right-hand sides.
For each $b \in \mathcal{B}$ , solve least squares over sampled rows: $\hat{w}_b \in \arg\min_w \|S(Xw-b)\|^2$ .
Direct search over small $\mathcal{B}$ and a $\delta$ -net ensures finding $(w, \Pi)$ such that

$\| X\hat{w} - \hat{\Pi}^\top y \|^2 \leq (1+\varepsilon) \min_{w, \Pi} \| Xw - \Pi^\top y \|^2.$

Algorithmic complexity: $(n/\varepsilon)^{O(d)}$ .

The average-case analysis achieves exact recovery:

For $x_i \sim \mathcal{N}(0, I_d)$ , $y_i = w_*^\top x_{\pi_*(i)}$ , the solution reduces to a subset-sum problem, enabling exact identification of $w_*$ and permutation $\pi_*$ when $n \geq d+1$ and inputs are noise-free.

3. Lattice Basis Reduction for Noiseless Recovery

Subset-sum translation enables use of lattice basis reduction (Lenstra–Lenstra–Lovász algorithm):

Construct coefficients $c_{i,j} = y_i ( \tilde{x}_j )^\top x_0$ , where $\tilde{x}_j$ are columns of the pseudoinverse $X^\dagger$ .
Define target $t = y_0$ and seek subset $S$ with $\sum_{(i,j) \in S} c_{i,j} = t$ .
Build lattice basis $B$ incorporating $I_{n^2+1}$ and offset vector $(-\beta c_{i,j}, \beta t)$ for suitable $\beta$ .
For correctly quantized instances, the shortest vector in the lattice yields the true permutation and thus recovers $w_*$ exactly if $n \geq d+1$ .
The method is brittle in presence of noise: subset-sum structure is destroyed by low-level contamination.

4. Signal-to-Noise Ratio Bounds and Recovery Limits

Rigorous lower bounds on signal-to-noise ratio (SNR) delineate feasibility (Hsu et al., 2017): $\mathrm{SNR} = \frac{\|w_*\|^2}{\sigma^2}$

For standard Gaussian covariates, approximate recovery is impossible when SNR $\leq C \min\{ d/\log\log(n), 1 \}$ for some $C>0$ .
No estimator $\hat{w}$ can achieve small error for sub-threshold SNR: $\| \hat{w} - w_* \| \geq (1/24) \|w_*\|$ .
With uniform covariates on $[-1/2, 1/2]^d$ , different constant thresholds apply.
Compared to traditional regression—with error scaling as $O(\sqrt{d/n})$ —the unlabeled setting is far less tolerant to noise.

5. Robustness, Contamination, and Computational-Statistical Tradeoffs

When responses are contaminated (i.e., $y = x^\top \beta + z$ , with $z$ independent of $x$ and drawn from $E$ such that $\Pr[z=0]=\alpha$ ), the sample complexity landscape is altered (Diakonikolas et al., 12 Oct 2025):

Information-theoretic recovery is achievable with $O(d/\alpha)$ samples.
All efficient polynomial-time algorithms require $\Omega(d / \alpha^2)$ samples—a quadratic gap in $1/\alpha$ due to computational limits.
In the Statistical Query (SQ) framework, any efficient algorithm needs simulation complexity at least $\tilde{\Omega}(d^{1/2}/\alpha^2)$ .
The distinction is formal and fundamental: computational hardness is not an artifact of existing methods but is rooted in problem structure.

Key formulas:

Basic model: $x \sim \mathcal{N}(0, I_d)$ , $y = x^\top \beta + z$ , $z \sim E$ , $P_{\beta,E}$ .
SQ lower bound: simulation complexity $m \geq \tilde{\Omega}(d^{1/2}/\alpha^2)$ .

6. Connections to Nonparametric Rates and RKHS Interpretations

Extensions appear in nonparametric and infinite-dimensional settings (Berthier et al., 2020):

For $Y = \langle \theta_*, X \rangle$ , where $X$ may be mapped into a Hilbert space or interpreted as features in an RKHS, stochastic gradient descent (SGD) with constant step-size achieves zero training error and polynomial generalization error decay: $E[\| \theta_n - \theta_* \|^2] = O(1/n^\alpha),\quad E[R(\theta_n)] = O(1/n^{\alpha + 1}),$ where $\alpha = \min(\alpha_1, \alpha_2)$ depends on the regularities of the optimum parameter and feature mapping.
RKHS framework translates these into Sobolev smoothness: for kernels with spectral decay and target functions of smoothness $r$ , convergence exponent $\alpha_*$ depends on both kernel and function smoothness.

7. Applications, Limitations, and Open Problems

Applications of noiseless Gaussian regression models include analysis of sensor networks with ambiguous measurement ordering, record linkage under privacy, and theoretical studies of estimator optimality under missing data.

Strengths:

Under ideal (noiseless, precise) conditions, exact recovery algorithms yield unique solutions in low dimension with minimal sample size ( $n \geq d+1$ ).
Fully polynomial-time approximation schemes make near-optimal estimation feasible for moderate $d$ .

Limitations:

Lattice-based methods are unacceptably sensitive to noise.
All presented algorithms scale poorly in higher dimensions, especially when correspondence is missing.
Information-computation gaps (quadratic sample complexity barrier) persist under contamination even for robust/efficient algorithms.

Open problems:

Extending computational lower bounds for contaminated regression beyond SQ algorithms.
Bridging the divide between theoretical possibility and practical, robust estimator construction in higher dimensions and under contamination.

In summary, noiseless linear regression under Gaussian covariates provides a foundational lens to explore exact recovery, correspondence uncertainty, robust regression, and computational-statistical limits, with both positive algorithmic results and sharp negative impossibility theorems systematically clarifying the boundaries of modern high-dimensional inference.

PDF Markdown Chat (Pro)

References (3)

Linear regression without correspondence (2017)

Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination (2025)

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model (2020)

Follow Topic

Get notified by email when new papers are published related to Noiseless Linear Regression under Gaussian Covariates.