Papers
Topics
Authors
Recent
Search
2000 character limit reached

Regression Estimators (REG) Overview

Updated 23 March 2026
  • Regression Estimator (REG) is a statistical tool that maps observed data to regression parameters using unbiased, nonparametric, cluster-based, and robust approaches.
  • It is applied in diverse settings from classic linear regression to high-dimensional, time-varying, and adaptive models, ensuring precise inference across various data structures.
  • Empirical and theoretical studies show that REG methods reduce bias, control variance, and improve predictive accuracy even under heavy-tailed or outlier-prone conditions.

A regression estimator (REG) refers generically to any mapping from observed data to regression parameters, but in the academic literature, several distinct estimators have been proposed and labeled using the REG acronym or closely related terminology. Below, key classes and instantiations of regression estimators are systematically summarized, highlighting their mathematical constructions, large-sample properties, robustness, implementation, and application domains.

1. Classical and Unbiasedness-based Regression Estimators

A foundational approach to regression estimation is based on unbiasedness principles, requiring minimal distributional assumptions on the regressors.

  • Population formulation: With random variables YY and regressors X1,,XpX_1,\ldots,X_p, the conditional mean is specified as E[YX]=b0+XTbE[Y \mid X] = b_0 + X^Tb, where b0Rb_0\in\mathbb{R} and bRpb\in\mathbb{R}^p. Without invoking an explicit error term, estimation is built on population covariances:

b=Cov(X,X)1Cov(Y,X),b0=E[Y]E[X]Tb.b = \mathrm{Cov}(X,X)^{-1}\mathrm{Cov}(Y,X),\quad b_0 = E[Y] - E[X]^Tb.

Sample analogs yield the unbiased estimator:

b^=Sxx1Syx,b^0=YˉXˉTb^,\hat b = S_{xx}^{-1}S_{yx},\quad \hat b_0 = \bar Y - \bar X^T\hat b,

under invertibility of SxxS_{xx}. This estimator coincides with the ordinary least squares estimator under fixed design, extends naturally to AR(pp) time series, and admits unbiased variance estimation (Vellaisamy, 2015).

2. Distribution Regression Estimator (Nonparametric Likelihood Regression)

Distribution regression, as introduced by Chen, Ma, and Zhou, generalizes mean and quantile regression by modeling the entire error distribution nonparametrically.

  • Model Specification: For i.i.d. data {(xi,Yi)}\{(x_i, Y_i)\}, with Yi=ν+xiTβ+ϵiY_i = \nu + x_i^T\beta + \epsilon_i, the error law ff is left unspecified and can be asymmetric, heavy-tailed, or multimodal.
  • Likelihood Construction: The error distribution is estimated via kernel density methods:

f~nh(z)=1nhi=1nK(zϵi(β)h),\tilde{f}_{nh}(z) = \frac{1}{nh}\sum_{i=1}^nK\left(\frac{z-\epsilon_i(\beta)}{h}\right),

with a negligible constant for stability. The nonparametric log-likelihood is

nh(β)=j=1nlog{1nhi=1nK(YjYi(xjxi)Tβh)+n1000}.\ell_{nh}(\beta) = \sum_{j=1}^n\log\left\{\frac{1}{nh}\sum_{i=1}^n K\left(\frac{Y_j-Y_i-(x_j-x_i)^T\beta}{h}\right)+n^{-1000}\right\}.

  • Estimation: The estimator β^\hat\beta maximizes the nonconvex log-likelihood. Optimizers include BB (Varadhan-Gilbert) and local quadratic approximation schemes. The intercept ν\nu is estimated post hoc given β^\hat\beta.
  • Asymptotic Theory: Under mild kernel regularity, β^\hat\beta is n\sqrt{n}-consistent and asymptotically normal with limiting variance given by the Fisher information of the unknown ff.
  • Penalized Extension: An adaptive-LASSO penalized version

Qn(β)=nh(β)/n+λnj=1pwjβjQ_n(\beta) = -\ell_{nh}(\beta)/n + \lambda_n\sum_{j=1}^p w_j|\beta_j|

achieves the oracle property for variable selection when pp\to\infty, p=O(λnn)p=O(\lambda_n \sqrt{n}), and λnn\lambda_n\sqrt{n}\to\infty.

  • Empirical Performance: REG matches mean or quantile regression under Gaussian or Laplace noise, but dominates both under heavy-tailed or multimodal errors, yielding lower bias and mean squared error. High-dimensional penalized REG consistently improves correct-fit rates and predictive error compared to adaptive-LASSO or LASSO quantile regression (Chen et al., 2017).

3. Regression Estimators in Clustered and High-dimensional Settings

Recent developments emphasize regression estimation tailored for clustered or high-dimensional data.

Cluster-average estimator:

  • Model: Data divided into GG independent clusters, each with cluster-specific data matrices (Xg,Yg)(X_g,Y_g), targeting a common parameter β\beta.
  • Procedure: OLS is computed in each cluster to obtain β^g\hat\beta_g, then averaged:

β^cluster=1Gg=1Gβ^g.\hat\beta_{\text{cluster}} = \frac{1}{G}\sum_{g=1}^G\hat\beta_g.

Alternative expression is as a weighted least squares with WW block-diagonal, Wg=(XgTXg)1W_g=(X_g^TX_g)^{-1}.

  • Variance and Inference: Variance admits a closed form, and a robust Wald-type test for Rβ=rR\beta=r is constructed. Asymptotic normality holds under both fixed and random coefficient models as GG\to\infty, regardless of cluster size imbalance or strong within-cluster dependence.
  • Comparison: In contrast to pooled OLS, cluster-average remains consistent under severe cluster imbalances, with lower empirical and asymptotic variance in unbalanced settings. Simulation studies demonstrate control of size and superior power for modest GG and highly dominant clusters (Dey et al., 4 Feb 2026).

Minimax and high-dimensional estimators:

  • Projected nearest neighbor (PNN) estimator: For y=Xβ+gy=X\beta+g with pnp\gg n and βqp(R)\beta\in\ell_q^p(R), PNN projects onto a kk-dimensional subspace and performs nearest neighbor estimation in orthogonal directions:

y^P=Py~+argminzPKPy~z2,\hat y_P = P\tilde y + \arg\min_{z\in P^\perp K}\|P^\perp \tilde y-z\|_2,

where K=Xqp(R)K=X\ell_q^p(R). The minimax risk is characterized by Kolmogorov width dk(K)d_k(K); PNN achieves risk within O(logp)1q/2O(\log p)^{1-q/2} of the minimax bound for any design. Efficient algorithms based on semidefinite relaxation are available for q=1q=1 (Zhang, 2012).

4. Robust and Adaptive Regression Estimators

Robustness and adaptivity to contamination are critical for reliable regression.

  • Shooting S-estimator: Designed for componentwise outlier contamination, this estimator sequentially updates regression coefficients via coordinate descent, embedding robust S-regressions. Each covariate is "cleaned" using hard rejection weights, and “partial residuals” are iteratively recomputed. The algorithm does not possess strict regression equivariance but considerably improves robustness to cellwise contamination compared to classic estimators, performing favorably in both simulation and real data (Öllerer et al., 2015).
  • Time-varying and nonlinear regression estimators: REG estimators based on least-squares with dynamic extensions (LS + D, DREM) allow global exponential convergence in identifiable settings, accommodate time-varying parameters via forgetting factors, and employ a mixing step (DREM) for scalar regression decoupling. This approach extends to nonlinear parameterizations (Γ(θ)\Gamma(\theta) monotonic) and switched systems, ensuring bounded-input-bounded-state stability with respect to disturbances, faster transients, and applicability under weaker excitation conditions compared to classical LS (Ortega et al., 2022).

5. Bayesian Regression Estimators

The Bayesian regression estimator, under general loss functions, is defined as the minimizer of posterior-predictive expected loss: tn(x1)=argmintρ(y,t)Πn(dyx1),t_n^*(x_1) = \arg\min_t \int \rho(y,t)\,\Pi_n(dy\mid x_1), with Πn\Pi_n the posterior predictive for YY given X1=x1X_1=x_1 after nn observations. For squared error loss, the estimator is the posterior predictive mean; for absolute error loss, the posterior predictive median. Under minimal assumptions (identifiable model, finite second moment, standard Borel covariate space), these estimators are strongly consistent in both L1L^1 and L2L^2, and the corresponding Bayes risks converge to zero, justifying the predictive distribution viewpoint for both parametric and nonparametric models (Nogales, 2022).

6. Computational and Algorithmic Considerations

  • Row-wise and memory-efficient regression: Most regression estimators for linear, regularized, and even nonlinear models (OLS, IV, Ridge, LASSO, Elastic Net, probit, logit) are functionally dependent only on a small set of sufficient statistics (e.g., cross-products, right-hand-sides). These can be accumulated in a single pass over data, updating K×KK\times K (design) and K×1K\times 1 (response) matrices without ever constructing the full N×KN\times K matrix in memory. Clustered and robust variance estimators require minor modifications (per-cluster or per-block accumulators). This enables estimation on massive datasets with strict memory limits, matching the results of classic estimators and enabling inference tasks (homoscedastic, heteroskedastic, clustered errors, bootstraps) efficiently (Clarke et al., 2023).

7. Comparative Summary of Major Regression Estimators

Estimator Assumptions & Strengths Robustness/Adaptivity Notable Properties
OLS/Unbiased REG Arbitrary regressors, minimal moment requirements Not robust to outliers Unbiased, variance estimable, coincides with least-squares under fixed X
Distribution REG Unknown, arbitrary error distribution ff Robust to asymmetry, heavy-tails, multimod n\sqrt{n}-consistent, penalized version enjoys oracle property
Cluster-average REG Arbitrary cluster design, strong within-cluster dep Robust to cluster size imbalance Consistent, powerful Wald tests, valid for random coefficients
PNN/high-dim REG High-dimensional (pnp\gg n), soft sparsity constraints Minimax optimal, adaptive radius selection Achieves risk close to theoretical minimax bound, efficient SDP approach
Shooting S-estimator Cellwise (component) contamination High breakdown point versus cell outliers Bounded influence, coordinate descent, lacks full regression equivariance
Adaptive LS+D/DREM REG Nonlinear, time-varying, switched, weak excitation BIBS stable, tracks parameter changes Monotonicity decoupling, exponential convergence, robust transients
Bayesian Regression Arbitrary model/likelihood, prior Q Posterior-predictive optimality Consistency in L1L^1/L2L^2, vanishing Bayes risk under weak assumptions

References

  • (Vellaisamy, 2015) — Unbiasedness approach and relationships to OLS.
  • (Chen et al., 2017) — Distribution regression, nonparametric likelihood approaches, oracle property.
  • (Dey et al., 4 Feb 2026) — Cluster-average regression, robust inference with clustered data.
  • (Zhang, 2012) — Projected nearest neighbor estimation for high-dimensional regimes.
  • (Öllerer et al., 2015) — Robust shooting S-estimator for cellwise outliers.
  • (Ortega et al., 2022) — Adaptive LS + D/DREM regression for nonlinear, time-varying, and switched systems.
  • (Nogales, 2022) — Bayesian regression estimator and predictive distribution consistency.
  • (Clarke et al., 2023) — Memory-efficient row-wise regression estimation.

These developments delineate the current theoretical and algorithmic landscape for regression estimation, encompassing classical, robust, high-dimensional, cluster-aware, adaptive, and Bayesian regimes, with rigorous results supporting principled application across diverse data structures and inferential demands.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Regression Estimator (REG).