Two-Stage Least Squares (2SLS) in Econometrics

Updated 26 February 2026

2SLS is a method to address endogeneity in regression models by using external instrumental variables and a two-stage estimation process.
The procedure first projects endogenous variables onto the instrument space and then regresses outcomes on the fitted values to yield consistent estimates.
Instrument strength is crucial; methods like the delete-d jackknife test help differentiate weak from strong instruments, ensuring robust inference.

Two-stage least squares (2SLS) is a fundamental estimation technique for linear systems of equations featuring endogeneity, particularly in econometrics and related fields. By leveraging external instrumental variables (IVs), 2SLS delivers consistent estimates of causal effects in the presence of endogenous regressors under appropriate structural and identification assumptions. Its mathematical, inferential, and practical properties have been the subject of extensive theoretical refinement, methodological extension, and critical empirical evaluation.

1. Formal Definition and Computation

Consider the canonical triangular system,

$y = Y\beta + u, \quad Y = Z\Pi + V$

where $y$ is an $n \times 1$ outcome, $Y$ is $n \times p$ of endogenous regressors, $Z$ is $n \times K_n$ of instruments ( $K_n$ possibly growing with $n$ ), and $u, V$ are random disturbances, with exogeneity $Z \perp (u,V)$ . The two-stage least squares estimator is defined as:

First stage: Project the endogenous regressors onto the instrument space:

$\widehat Y = P_Z Y \quad \text{with} \quad P_Z = Z(Z'Z)^{-1}Z'$

Second stage: Regress $y$ on the fitted values from the first stage,

$\hat\beta_{2SLS} = (\widehat Y'Y)^{-1} \widehat Y'y = (Y'P_Z Y)^{-1} Y'P_Z y$

For the special case of a single endogenous regressor and a single instrument, 2SLS collapses to the ratio of two covariances. The key requirement for point identification is that the rank of $E[Z'Y]$ equals $p$ .

2. Large Sample Theory: Weak vs Strong Instruments

2SLS estimation is highly sensitive to the “strength” of the instrumental variables, governed by the rate at which the matrix $Z'Z$ grows and the signal in $Z$ for $Y$ .

Define a strength scale $s_n$ so that $Z'Z/s_n \to \Theta$ (non-singular):

Many-weak regime: $s_n/n \rightarrow 0$ (“null hypothesis,” $H_0$ ).
Many-strong regime: $s_n/n \rightarrow \kappa_0 > 0$ (“alternative,” $H_1$ ).

Under either regime, both 2SLS and OLS estimators converge in probability, but to different limits in $H_1$ and to the same limit in $H_0$ . The difference

$\hat\beta_{2SLS} - \hat\beta_{OLS}$

has the following limiting behavior (Huang et al., 2023):

(a) Many-weak:

$\sqrt{n}\bigl(\hat\beta_{2SLS} - \hat\beta_{OLS}\bigr) \xrightarrow{d} N(0, \Sigma_0)$

with explicit closed-form $\Sigma_0$ for $p=1$ .

(b) Many-strong:

$\sqrt{n}\Bigl[\bigl(\hat\beta_{2SLS} - \hat\beta_{OLS}\bigr) - \Delta\Bigr] \xrightarrow{d} N(0, \Sigma_A)$

where $\Delta \neq 0$ and $\Sigma_A$ is block-structured.

Collapse of the difference to zero under many-weak instruments is the basis for specification testing.

3. Instrument Strength Testing: Delete-d Jackknife Procedure

To empirically distinguish between the weak and strong instrument regimes, Huang, Wang, and Yao (Huang et al., 2023) propose a specification test based on the distribution of $\hat\beta_{2SLS} - \hat\beta_{OLS}$ :

Delete-d Jackknife: Given a fraction $\lambda$ , delete $d = \lfloor \lambda n \rfloor$ cases repeatedly to form $m$ random subsamples of size $r=n-d$ . For each, compute the difference

$\theta_s = \hat\beta^{2SLS}_{(s)} - \hat\beta^{OLS}_{(s)}$

The sampling covariance estimator becomes

$\widehat\Sigma_0^{S} = \frac{n\,r}{d\,m} \sum_{s=1}^m (\theta_s - \bar\theta)(\theta_s - \bar\theta)'$

Test statistic:

$T_n = \theta_n' (\widehat\Sigma_0^{S})^{-1} \theta_n$

Under $H_0$ , $T_n$ is asymptotically $\chi^2_p$ .

Monte Carlo experiments confirm the reliability and power of this procedure in both size and power under various simulated strengths, dimensions, instrument-count ratios, and error distributions (Huang et al., 2023).

4. Empirical Illustration: Angrist–Krueger Returns to Education

Application to large-scale IV settings (e.g., Angrist–Krueger’s quarter-of-birth design with hundreds of instruments) demonstrates the utility of the test:

Full-sample OLS and 2SLS estimates nearly coincide (e.g., 0.0483 vs 0.0514), the F-statistics rule out the case of completely weak instruments, but the specification test does not reject the many-weak null at conventional levels, indicating moderate weakness rather than many-strong instruments.
The key implication is that the identification leverage from such a large instrument set is limited, with the effective difference between OLS and 2SLS estimators disappearing—empirically confirming theoretical predictions.

5. Implications for Specification, Robustness, and Inference

Failure to adjust for instrument strength leads to consequential inferential errors:

If instruments are many and weak: The variance of 2SLS is large, confidence intervals fail to cover, and point estimates may be highly unstable or indistinguishable from OLS.
Delete-d Jackknife variance: Standard plug-in or sandwich formulas are inconsistent in many-weak settings due to degenerate signal. The delete-d jackknife delivers a robust, computationally feasible estimator under both strong and weak instrument regimes.
Monte Carlo evidence: Coverage rates, empirical rejection rates, and finite-sample performance are all uniform across single and multiple endogeneity, under both Gaussian and heavy-tailed DGPs (Huang et al., 2023).

6. Guidelines and Extensions

The two-stage least squares estimator, while structurally simple, must be applied judiciously in regimes where the instrument count is large relative to sample size. The delete-d jackknife procedure resolves the variance estimation problem where classic formulas and other plug-in estimators fail. This approach is robust to multiple endogenous regressors, non-Gaussian errors, and high-dimensional instrument spaces.

Summary guidelines:

Compute both OLS and 2SLS estimators.
Form the test statistic $T_n$ based on the delete-d jackknife.
Use the result of the specification test to rule out or confirm effective instrument weakness, and guide subsequent inference.
In massive-instrument designs, do not rely on standard variance formulas or naive plug-in approaches.

Theoretical work and empirical evidence in (Huang et al., 2023) collectively reposition 2SLS in high-dimensional IV settings by providing nontrivial, computable diagnostics for distinguishing genuinely strong from moderately weak instrument sets, ensuring valid inference in modern applied contexts.

Markdown Report Issue Upgrade to Chat

References (1)

A specification test for the strength of instrumental variables (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Two-Stage Least Squares (2SLS).