ivmodels Software: Robust IV Analysis Tools

Updated 19 August 2025

ivmodels is a comprehensive software package for instrumental variables analysis, integrating k-class estimators and robust methods to handle weak instruments.
The package implements advanced inference procedures such as AR, CLR, LR, and LM tests to construct confidence intervals that remain valid even under weak instrument conditions.
ivmodels also offers practical tools for power calculations, sensitivity analysis on instrument exogeneity, and diagnostic tests for model specification and instrument strength.

The ivmodels software package is a suite of statistical tools for conducting instrumental variables (IV) analysis, emphasizing robust estimation and inference in the presence of weak instruments. Originally implemented in R and more recently available in Python, ivmodels introduces a unified framework for k-class estimators, weak-instrument-robust confidence intervals, power and sample size calculations, sensitivity analysis to violations of instrument exogeneity, and diagnostic procedures for specification testing. It is designed for settings with one endogenous variable and supports both matrix and formula-based interfaces for practical empirical research.

1. K-class Estimators and Framework

At the core of ivmodels is the implementation of k-class estimators, a general family that includes several classical methods as special cases. The k-class estimator is defined by a parameter $k$ controlling the leverage placed on projected instrumental variable residuals:

$\hat{\beta}_k = [D^*{}^\top (I - k R_{Z^*}) D^*]^{-1} D^*{}^\top (I - k R_{Z^*}) Y^*$

where $Y^*$ and $D^*$ are the residualized outcome and endogenous regressor after partialling out covariates, and $R_{Z^*}$ is the projection onto the instruments.

Special cases include:

$k=0$ : Ordinary Least Squares (OLS)
$k=1$ : Two-Stage Least Squares (TSLS)
$k=k_{\mathrm{LIML}}$ : Limited Information Maximum Likelihood (LIML)
Small adjustment from $k_{\mathrm{LIML}}$ : Fuller’s estimator

While the different k-class estimators are asymptotically equivalent (i.e., share limiting distribution) when $\sqrt{n}(k-1) \to 0$ , finite-sample properties can differ substantially, especially under weak instrument conditions. In particular, TSLS can be biased, while LIML and Fuller’s estimator display greater robustness to instrument weakness.

2. Weak-Instrument-Robust Inference Procedures

A main contribution of ivmodels lies in its robust inference machinery, explicitly addressing weak and potentially near-redundant instruments. The software offers several confidence set construction methods via test inversion:

Anderson–Rubin (AR) Test: Constructs a test statistic based on instrumented residuals, with a distribution bounded by F- or chi-squared quantiles even when instruments are weak. The AR statistic for hypothesis $\beta_0$ can be written as:

$AR(\beta_0) = \frac{\hat{Q}_1}{L}$

where $\hat{Q}_1$ summarizes quadratic forms in projected data.

Conditional Likelihood Ratio (CLR) Test: Combines information from the AR and standard likelihood ratio tests to produce confidence sets that maintain correct coverage probability regardless of instrument strength:

$CLR(\beta_0) = \frac{1}{2}(\hat{Q}_1 - \hat{Q}_3) + \frac{1}{2} \sqrt{(\hat{Q}_1 + \hat{Q}_3)^2 - 4(\hat{Q}_1 \hat{Q}_3 - \hat{Q}_2^2)}$

Likelihood Ratio (LR) and Lagrange Multiplier (LM) Tests: Additional procedures covered in the Python implementation generate further confidence sets; LM in particular can yield unions of ellipsoids that, while disjoint, retain valid coverage properties.

By inverting these test statistics (computing the set of $\beta$ values for which the hypothesis is not rejected), ivmodels delivers confidence intervals with at least nominal asymptotic coverage under both strong and weak-instrument asymptotics.

3. Power and Sample Size Calculations

The package provides explicit, finite-sample power formulas for key test statistics:

For the TSLS estimator, when $L=1$ and no covariates:

$\hat{\beta}_{\mathrm{TSLS}} \sim N\left(\beta, \frac{\sigma^2}{n \operatorname{Var}(D) \rho_{ZD}}\right)$

The power for detecting an alternative $\lambda = \beta - \beta_0$ is calculated using standard Normal or noncentral $F$ distributions for AR and TSLS tests.

Such analytical formulas allow users to determine the minimum sample size required for desired power and to evaluate trade-offs across test procedures.

4. Sensitivity Analysis for Instrument Exogeneity

ivmodels incorporates methods to formally investigate the impact of potential violations of the instrument exogeneity assumption. The structural model is augmented as:

$Y = D\beta + X^\top \kappa + \delta\sigma Z + \epsilon$

Here, $\delta$ quantifies the deviation (“invalidity”) from ideal IV assumptions. The user specifies a range $(\underline{\delta}, \overline{\delta})$ , and sensitivity intervals are constructed by adjusting the AR test statistic to account for this range. The confidence interval becomes:

$CI_{1-\alpha} = \{\beta : AR(\beta_0) < F_{1, n-p-1, \Delta^2 Z^*{}^\top Z^*; 1-\alpha}\}$

where $\Delta = \max(|\underline{\delta}|, |\overline{\delta}|)$ .

This approach quantifies robustness: if even small $\Delta$ values substantially widen intervals or render results insignificant, inferences are sensitive to plausible violations of the exclusion restriction.

5. Diagnostic Tools and Model Specification

Model specification and instrument strength are addressed through familiar diagnostic statistics:

Sargan’s $J$ statistic: For overidentification testing, with test distribution and interpretation applicable under both TSLS and LIML.
Cragg–Donald statistic / First-stage $F$ -test: Evaluates relevance of instruments; the practical threshold $F > 10$ is formally justified.
Residual–prediction test: Uses sample splitting and potentially machine-learning-based regressions to check independence of the residual from instruments, serving as a check on exogeneity.

These tests, available in both R and Python interfaces, guide users in diagnosing violations of the core IV assumptions.

6. Practical Implementation and Use Cases

ivmodels can be interfaced via formula notation (R) or programmatic data/environment specification (Python). Users provide data for the outcome ( $Y$ ), endogenous variable ( $D$ ), instruments ( $Z$ ), and exogenous controls ( $X$ ); the package then performs estimation, inference, and diagnostic routines.

A canonical application, as demonstrated in the package documentation, involves the Card (1995) paper on the returns to education using proximity to a four-year college as an instrument for education. After partialling out geographic and demographic covariates, the user assesses instrument strength (first-stage $F$ -statistic), fits OLS, TSLS, LIML and Fuller estimators, and constructs both standard and weak-instrument-robust confidence intervals. Sensitivity analysis with respect to geographic confounders (“south”/“smsa”) further quantifies the persistence of the causal estimates to potential instrument invalidity. Diagnostic plots contrast the performance of IV vs. non-IV estimators and plot power as a function of sample size.

ivmodels differs conceptually from packages such as RobustIV and controlfunctionIV (Koo et al., 2023), which allow for (potentially high-dimensional) settings with many possibly invalid instruments. Those packages implement two-stage hard thresholding and robust selection/voting schemes to identify valid instruments among many candidates and offer uniformly valid inference even under imperfect selection. By contrast, ivmodels assumes all candidate instruments satisfy the core IV assumptions and focuses on robustness to weak instrument relevance, rather than invalid instruments per se.

The Python implementation of ivmodels (Londschien, 17 Aug 2025) extends the core statistical algorithms with comprehensive code examples and notebook-style demonstrations, facilitating reproducible empirical work and further technical extensions. Its primary focus remains weak-instrument-robust estimation, inference, and specification testing within the classical IV modeling paradigm.

ivmodels thus provides a comprehensive suite of methods for IV analysis, emphasizing weak-instrument robustness, sensitivity to exogeneity violations, and practical diagnostics for empirical research with a single endogenous regressor. Its workflow is grounded in established econometric theory and illustrated on influential empirical datasets.