Instrumental Variable Least Squares

Updated 26 February 2026

Instrumental Variable LS (IV-LS) is a method for consistently estimating linear models with endogenous regressors using a two-stage approach with valid instruments.
The methodology is enhanced by regularization techniques like ridge penalties and first-stage shrinkage, addressing variance issues and weak instruments.
Recent extensions adapt IV-LS to nonlinear, functional, and online settings, improving bias reduction, robustness, and overall efficiency.

Instrumental Variable Least Squares (IV-LS), commonly referred to as two-stage least squares (2SLS), is the foundational method for consistent estimation of linear models with endogenous regressors when valid instrumental variables (IVs) are available. Recent innovations extend IV-LS to high-dimensional, functional, nonparametric, nonlinear, and online contexts, focusing on regularization, bias reduction, robustness, and efficiency.

1. Classical Formulation and Methodology

The classical IV-LS (2SLS) framework considers a structural model

$y = X\beta + \varepsilon, \quad X = Z\Gamma + u,$

where $y \in \mathbb{R}^n$ is the outcome, $X \in \mathbb{R}^{n \times k}$ possibly includes endogenous regressors, and $Z \in \mathbb{R}^{n \times m}$ ( $m \ge k$ ) are full-rank instruments. The standard two-stage procedure involves:

Stage 1: Regress the endogenous columns of $X$ on $Z$ to obtain fitted values $\hat X = P_Z X$ (where $P_Z$ is the projection onto $\operatorname{col}(Z)$ ).
Stage 2: Regress $y$ on $\hat X$ using OLS, yielding the point estimator

$\hat\beta_\mathrm{2SLS} = (X' P_Z X)^{-1} X' P_Z y.$

Under instrument exogeneity ( $\mathbb{E}[Z'\varepsilon]=0$ ), relevance ( $\operatorname{rank}(\mathbb{E}[Z'X])=k$ ), and homoskedasticity/full rank, $\hat\beta_\mathrm{2SLS}$ is consistent while OLS is generally biased if $X$ is endogenous (Ginestet et al., 2015, Guo et al., 2016).

The population moment condition is:

$\mathbb{E}[Z^\top(y - X\beta)] = 0,$

corresponding to the minimization of the IV quadratic objective function

$J_0(\beta) = \frac{1}{2n} (y-X\beta)' P_Z (y-X\beta).$

2. Bias, Efficiency, and Regularization

While 2SLS eliminates bias from endogeneity under valid IVs, it often exhibits higher variance than OLS, especially with weak instruments.

Convex Least Squares (CLS): Combines OLS and 2SLS by minimizing the mean squared error (MSE) of $\hat\beta_\mathrm{CLS}(k) = k \hat\beta_\mathrm{OLS} + (1-k)\hat\beta_\mathrm{2SLS}$ over $k \in [0,1]$ (Ginestet et al., 2015). The optimal $k^*$ is data-driven and unique unless OLS and 2SLS have identical MSEs. Finite-sample and simulation evidence shows CLS outperforms pure 2SLS, especially with weak instruments.
Ridge-Path Estimation: To mitigate poor-precision or nearly-flat IV criteria, a penalized objective adds a ridge penalty towards a prior $\beta^p$ :

$J_n(\beta;\lambda) = \frac{1}{2n}(y-X\beta)'P_Z(y-X\beta) + \frac{\lambda}{2}(\beta-\beta^p)'(\beta-\beta^p).$

Sample-splitting selects the regularization parameter $\hat\lambda$ by minimizing the unpenalized IV criterion on a test block, yielding the estimator

$\hat\beta = (\hat\Sigma + \hat\lambda I)^{-1} [\hat\xi + \hat\lambda \beta^p],$

where $\hat\Sigma = X_\mathrm{train}'P_{Z_\mathrm{train}}X_\mathrm{train}/n_\mathrm{train}$ , $\hat\xi = X_\mathrm{train}'P_{Z_\mathrm{train}}y_\mathrm{train}/n_\mathrm{train}$ (Sengupta et al., 2019). Ridge-path provides large MSE reductions in small or weak-instrument samples. The joint limit law of $(\hat\beta,\hat\lambda)$ is a nonstandard mixture with positive mass at $\hat\lambda=0$ .

James-Stein First Stage Shrinkage: For scalar $x$ and $\ell \ge 4$ instruments, shrinking the first-stage OLS estimator of $x$ toward zero (using a James–Stein factor) strictly reduces finite-sample bias relative to standard 2SLS, at fixed invariance properties (Spiess, 2017). The final estimate is obtained using a control-function approach in the second stage.

3. Extensions: Nonlinear, Functional, and Modern IV-LS

Recent work generalizes IV-LS to various contemporary settings.

Kernel IV Regression (KIV): Extends 2SLS to nonlinear settings using reproducing kernel Hilbert spaces (RKHS). Stage 1 learns a conditional mean embedding $\hat\mu(\cdot)$ from $Z$ to the RKHS on $X$ . Stage 2 regresses $y$ onto the fitted RKHS features, using kernel ridge regression (Singh et al., 2019). KIV is minimax-optimal and outperforms classical IV when the structural relation is nonlinear.
Functional IV-LS: For scalar-on-function regression models with measurement error and functional instruments, basis expansion reduces the infinite-dimensional functional regression to a finite-dimensional two-stage IV-LS problem (Chen et al., 15 Sep 2025). Both pointwise and multivariate TSLS methods are considered, providing substantial bias and mean integrated squared error (AIMSE) reductions over naive or regression-calibration approaches.
System Identification and Time Series: IV-LS can debias least-squares identification of dynamical systems even when external instruments are unavailable, by sample splitting and filtering to create “internal” instruments (Kuang et al., 12 Nov 2025). Regularization ensures L^p consistency at O( $n^{-1/2}$ ) rate, with dramatic reductions in bias and root MSE (RMSE) on nonlinear time series models.

4. Robustness and Double-Robust Estimation

Model misspecification can inflate 2SLS variance or bias.

Double-Robust G-Estimators: Double-robust G-methods achieve consistency if either the instrument-assignment model $f(Z|C)$ or the outcome model $m_y(C; \theta)$ is correctly specified. The semiparametric efficient estimator within this class uses an index $e_\mathrm{opt}(Z,C)$ involving conditional variance and conditional means. Efficiency-maximized and bias-reduced G-estimators further improve finite-sample performance under local misspecification (Vansteelandt et al., 2015).
Control Function and Pretest Approaches: In nonlinear models, the control-function (CF) estimator augments the second-stage regression with functions of first-stage residuals. CF and 2SLS are equivalent in linear contexts, but diverge otherwise. Hausman-type pretests can adaptively choose between CF and 2SLS, optimizing for worst-case MSE under unknown validity of augmented instruments (Guo et al., 2016).

5. IV-LS and Treatment Effect Heterogeneity

Addressing structural and causal heterogeneity requires extended IV-LS frameworks.

Interacted 2SLS for Heterogeneity: The interacted two-stage least squares (2SLS) incorporates treatment-covariate interaction terms and their corresponding instrument interactions. Under either (i) linearity of instrument-covariate interactions or (ii) correct linear outcome modeling, the estimator delivers consistent projections of heterogeneous local average treatment effects (LATE) among compliers (Zhao et al., 1 Feb 2025). If these fail, stratification by the IV-propensity score approximates the heterogeneity structure.
Weighting Interpretation: The weights in interacted 2SLS admit a formal connection to the Abadie (2003) weighting scheme for compliers; necessary and sufficient conditions for identification highlight the relevance of instrument-covariate linearity assumptions.

6. Online, Adaptive, and High-Dimensional IV-LS

Adaptation to online data streams and large dimensions drives new IV-LS algorithms.

Online 2SLS (O2SLS): In sequential settings, O2SLS recursively updates both first- and second-stage sufficient statistics, maintaining adaptively regularized Gram matrices. The method achieves $\mathcal O(d_x d_z \log^2 T)$ identification regret and $\widetilde{\mathcal O}(\gamma \sqrt{d_z T})$ oracle regret, where $\gamma$ quantifies endogeneity (Vecchia et al., 2023). When embedded in an optimistic linear bandit (OFUL-IV), the regret bound matches the exogenous lower bound.
Bias-variance Tradeoffs and High-dimensionality: IV-LS extensions now routinely incorporate regularization (ridge, lasso, group penalties), cross-validation for tuning parameter selection, and shrinkage, all underpinned by nonstandard asymptotic theory when the tuning itself is estimated from the data (Sengupta et al., 2019).

7. Applications and Practical Considerations

IV-LS methods have been empirically validated in a range of settings:

In education economics, CLS and 2SLS provide consistent but different estimates for the returns to schooling, with CLS exhibiting reduced MSE for weak instruments (Ginestet et al., 2015).
In functional regression, IV-LS corrections are essential for unbiased inference where scalar outcomes depend on error-prone functional predictors (Chen et al., 15 Sep 2025).
In nonparametric or nonlinear settings, KIV and related methods consistently outperform linear IV-LS when the underlying relationships depart from linearity (Singh et al., 2019).

Key practical recommendations include:

Careful instrument selection and evaluation of their strength and exogeneity.
Employing regularized or shrinkage-based variants in low-signal or high-dimensional settings.
Using bootstrap or robust (sandwich) variance estimators to account for tuning parameter estimation and model uncertainty.

References

"The Ridge Path Estimator for Linear Instrumental Variables" (Sengupta et al., 2019)
"Convex Combination of Ordinary Least Squares and Two-stage Least Squares Estimators" (Ginestet et al., 2015)
"Control Function Instrumental Variable Estimation of Nonlinear Causal Effect Models" (Guo et al., 2016)
"Instrumental variables system identification with $L^p$ consistency" (Kuang et al., 12 Nov 2025)
"Interacted two-stage least squares with treatment effect heterogeneity" (Zhao et al., 1 Feb 2025)
"Kernel Instrumental Variable Regression" (Singh et al., 2019)
"Least squares-based methods to bias adjustment in scalar-on-function regression model using a functional instrumental variable" (Chen et al., 15 Sep 2025)
"Robustness and efficiency of covariate adjusted linear instrumental variable estimators" (Vansteelandt et al., 2015)
"Stochastic Online Instrumental Variable Regression: Regrets for Endogeneity and Bandit Feedback" (Vecchia et al., 2023)
"Bias Reduction in Instrumental Variable Estimation through First-Stage Shrinkage" (Spiess, 2017)