Econometric Specification Tests

Updated 10 June 2026

Econometric Specification Tests are a suite of procedures that assess if model assumptions, such as functional form and distribution, are correctly specified.
They employ various techniques including empirical process methods, overidentification tests, and bootstrap procedures to ensure reliable inference.
These tests guide model selection and robust analysis in complex settings, covering parametric, semiparametric, and nonparametric frameworks.

Econometric specification tests comprise a diverse collection of formal procedures designed to assess the adequacy of functional forms, distributional assumptions, structural restrictions, and implicit identifying conditions in parametric, semiparametric, and nonparametric econometric models. Rigorous specification testing is essential for validating empirical economic models, informing the choice of estimators, guiding robust inference, and supporting credible structural interpretations.

1. General Principles and Theoretical Framework

Specification testing evaluates the null hypothesis that the empirical model under consideration adheres to a particular set of restrictions, such as linearity, correct functional form, distributional assumptions, or the validity of exclusion restrictions or identifying conditions. This often takes the form

$H_0: P \in \mathcal{P}_0$

where $\mathcal{P}_0$ is a subset of probability measures consistent with the maintained model, and $P$ is the true data-generating mechanism. Tests typically exploit the structure of estimators or residual processes, comparing models with and without added complexity or different assumptions.

Specification tests must account for the sampling properties of both parametric and semiparametric estimators, the potential collinearity between the testing statistic and underlying estimators, and limitations posed by finite samples or nuisance parameters. Importantly, for broad classes of convex, symmetric specification tests—including those based on quadratic forms, suprema, or empirical process norms—conditional coverage is never anti-conservative: post-test confidence sets maintain at least nominal coverage under the null, regardless of the dependence structure between the test statistic and the estimator (Chaisemartin et al., 2024).

2. Parametric Model Specification: Classical and Modern Approaches

General Regression and Distributional Models

In classical regression settings, specification tests include the Ramsey RESET test (omitted polynomial terms), White and Breusch-Pagan heteroskedasticity tests, and the family of overidentification J-tests for GMM. The latter can be cast as convex symmetric functionals in the empirical process of moment restrictions (Chaisemartin et al., 2024): $J_n = n \cdot g(\hat\theta_n)' W_n g(\hat\theta_n) \rightarrow \chi^2_{q-p}$ where $g(\hat\theta_n)$ is the sample mean of the moment condition at the estimator, and $W_n$ is a weighting matrix.

Instrumental Variables and Weak-IV Regimes

For instrumental variables (IV) models with many instruments ( $K_n \to \infty$ as $n\to\infty$ ), the strength of identification becomes a central concern. The difference between the 2SLS and OLS estimators,

$\Delta_n = \hat\beta_{2SLS} - \hat\beta_{OLS}$

is key. Under the null of many weak instruments ( $s_n/n \to 0$ ), $\mathcal{P}_0$ 0 converges to zero mean, while under strong instruments, it converges to a nonzero shift. A valid test statistic is

$\mathcal{P}_0$ 1

whose asymptotic covariance must be estimated, typically via a delete- $\mathcal{P}_0$ 2 jackknife to handle complex covariance structures in high dimensions (Huang et al., 2023).

This test refines conventional Stock-Yogo weak instrument diagnostics by distinguishing “moderately weak” from “strong” instrument asymptotics, with direct implications for the validity of LIML, overidentification, and inference procedures.

3. Empirical Process-Based and Goodness-of-Fit Tests

Marked Empirical Process and Cramér–von Mises Functionals

A unifying framework for modern specification tests uses empirical processes marked by model residuals. For diffusion models with latent stochastic volatility, goodness-of-fit is assessed via marked processes over drift or diffusion functions, with statistics of Kolmogorov–Smirnov (KS) or Cramér–von Mises (CvM) type: $\mathcal{P}_0$ 3 Bootstrap calibration is generally required due to dependence of asymptotic distributions on nuisance parameters and latent states (López-Pérez et al., 2022, Cavaliere et al., 2021).

Conditional Distribution and Finite Sample Binning

For general conditional distribution models, contingency-table-based χ² tests using the Rosenblatt transform reduce the problem to joint independence of model-based probability integral transforms and covariates. The resulting Pearson, Deviance, Chernoff–Lehmann, and generalized Wald statistics are all χ² with $\mathcal{P}_0$ 4 degrees of freedom, and remain valid under a wide range of binning protocols and finite sample designs, provided expected counts are not too small (Delgado et al., 2022).

Nonparametric Monotonicity and Local Alternatives

Testing qualitative shape restrictions (e.g., regression monotonicity) employs studentized U-statistics over a multi-scale grid, utilizing kernel or local weighting, and selecting subsample regions via sophisticated bootstrap and set selection procedures. Step-down methods yield adaptive, rate-optimal minimax power against both fixed and local alternatives (Chetverikov, 2012).

4. Model Classes with Structural Complexity

Panel Data and Outlier Robustness

Extending the classic Hausman test to robust specification checking in linear panels, outlier-resistant approaches use weighted likelihoods for both FE and RE estimation. The robust test is

$\mathcal{P}_0$ 5

Retaining asymptotic χ² distribution, this maintains correct size and power in the presence of data contamination (Beyaztas et al., 2021).

GARCH and Log-GARCH-Type Volatility Models

For GARCH models, specification is checked via a marked empirical process of centered squared residuals. Because the asymptotic law is non-pivotal and sensitive to nuisance parameters (especially on boundaries), a shrinkage bootstrap is required: parameters close to zero are set to zero in the bootstrap DGP to mirror boundary behavior (Cavaliere et al., 2021). Log-GARCH and EGARCH distinction is achieved via Lagrange multiplier tests and portmanteau tests based on squared residual autocorrelations, with high power in both simulated and financial market data (Francq et al., 2016).

5. High-Dimensional, Nonstandard, and Causal Models

Series-Based, Nonparametric, and Spatial Dependence

For spatial data, series-based specification tests project the regression function onto a high-dimensional basis and test for specification via quadratic forms: $\mathcal{P}_0$ 6 with $\mathcal{P}_0$ 7, where $\mathcal{P}_0$ 8 and $\mathcal{P}_0$ 9 are residuals from parametric and series fits (Gupta et al., 2021).

Causal Inference: Propensity Score Specification Tests

Recent nonparametric tests for propensity score specification utilize infinite families of unconditional moments: $P$ 0 The projected empirical process is calculated for each $P$ 1, and CvM or KS-type test statistics are constructed. This projection eliminates first-order effects of nuisance parameter estimation, ensuring valid size and high power in high-dimensional covariate settings. Multiplier bootstrap provides valid critical values (Sant'Anna et al., 2016).

Quantile, Nonstationary, and Measurement Error Models

Tests for quantile regression specification allow quantile-dependent nonlinear covariate effects or basis expansions and compare model-implied distributions to the empirical joint law via CvM statistics. Power is enhanced using flexible sieve expansions and bootstrap critical values (Kutzker et al., 2021). For nonstationary regressors, kernel-smoothed U-statistics indexed over pairwise time series combinations and normalized by intersection local time allow specification assessment in nonlinear, potentially cointegrated systems (Wang et al., 2012).

Measurement error models require deconvolution of residual empirical processes and orthogonal projections to remove parameter estimation effects. This enables the construction of ICM-type statistics with valid multiplier bootstrap procedures, handling both known and unknown error distributions (Song et al., 6 Nov 2025).

6. Bootstrap and Post-Selection Inference

A recurring feature in modern specification testing is the necessity of bootstrap procedures for valid inference, due to non-pivotal asymptotic laws and complex dependence on nuisance parameters or boundary regions of the parameter space. Shrinkage, multiplier, and residual bootstraps, combined with techniques such as delete- $P$ 2 jackknife or spectral approximations, are essential for reliable critical values.

The interaction of specification pre-tests and subsequent inference is governed by convexity and Gaussianity: post-test inference remains at least as conservative as the nominal level, even when the test statistic and parameter estimator are dependent (Chaisemartin et al., 2024). Under the null, pre-testing cannot make confidence intervals "liberal" (i.e., under-cover).

7. Model-Specific and Domain-Specific Testing

Specialized tests have been introduced for unique econometric structures:

Mixed-frequency regression: Distinguishing time-averaged and MIDAS models using a variant of the Hausman test with carefully constructed instruments optimized for high frequency ratios (Liu et al., 2018).
Stochastic frontier models: Goodness-of-fit for composite error distributions (e.g., normal/gamma, stable/gamma) using empirical transform distances (MGF, CF) and weighted quadratic statistics, with parametric bootstrap critical values to accommodate parameter estimation uncertainty (Papadimitriou et al., 2022).
Diffusion models with latent volatility: Residual-based marked empirical processes are used to test for functional form in drift and diffusion coefficients, again relying on bootstrap calibration (López-Pérez et al., 2022).

Table: Core Specification Test Types and Key Features

Test Class	Statistic / Criterion	Bootstrap/Correction
Hausman / Weak IV	$P$ 3	Delete- $P$ 4 jackknife for $P$ 5 (Huang et al., 2023)
GMM/J/Overid/RESET	Quadratic/convex form	Asymptotic χ² (Chaisemartin et al., 2024)
Empirical process (KS/CvM)	Sup / Integral norm	Bootstrap for nuisance dependence
Residuals in volatility/diffusion	Marked empirical process	Bootstrap via Kalman/sequential MC
GARCH boundary	$P$ 6 on squared residuals	Shrinkage bootstrap (Cavaliere et al., 2021)
Propensity score (causal)	Projected empirical CvM	Multiplier bootstrap (Sant'Anna et al., 2016)
Quantile regression	Empirical cdf/CvM	Bootstrap over resampled/predicted
Measurement error	Deconvolved empirical process	Multiplier bootstrap (Song et al., 6 Nov 2025)
Series/Spatial	Quadratic form of series fit vs parametric fit	Residual bootstrap (Gupta et al., 2021)

Specification tests in econometrics now encompass a broad toolkit ranging from classical overidentification and Lagrange multiplier approaches to bootstrap-calibrated empirical processes tailored for latent variable, high-dimensional, causal, and nonstationary settings. Methodological advances ensure robustness to identification failures, high-dimensional covariate structures, measurement error, weak instruments, spatial or temporal dependence, and provide reliable inference post pre-testing. The discipline continues to refine specification testing in response to ever more ambitious modeling and inference challenges.