Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

134 tokens/sec

GPT-4o

10 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Hannan-Quinn Information Criterion

Updated 30 June 2025

Hannan-Quinn Information Criterion (HQIC) is a model selection tool that uses a 2k ln ln n penalty to balance model fit and complexity for strong consistency.
Its penalty grows slower than BIC's yet faster than AIC's, achieving the minimal rate needed to prevent overfitting as sample size increases.
HQIC is applied across autoregressive models, linear regression, time series, and high-dimensional settings, offering a robust criterion for model selection.

The Hannan-Quinn Information Criterion (HQIC) is a model selection criterion designed to balance model fit and complexity, ensuring consistency in variable and model selection as sample size increases. HQIC was originally developed in the context of autoregressive processes and later rigorously extended to a broad range of statistical models, notably linear regression and time series, and has been further adapted to high-dimensional, structured, and incomplete data settings. Its penalty term is constructed to grow with sample size more slowly than the Bayesian Information Criterion (BIC) but faster than the Akaike Information Criterion (AIC), achieving a theoretically minimal rate for strong consistency in model selection.

1. Theoretical Foundation and General Formulation

HQIC is defined for a statistical model with log-likelihood $\mathcal{L}$ , $k$ free parameters, and data sample size $n$ as

$\mathrm{HQIC} = -2\mathcal{L} + 2k \ln \ln n$

The penalty term $2k \ln \ln n$ increases with sample size $n$ , but at the slowest possible rate that guarantees strong consistency in model order or variable selection. Here, strong consistency means that, as $n \to \infty$ , the probability that the selected model matches the true data-generating model converges to one almost surely.

For model selection, one computes the HQIC for candidate models and selects the model with the lowest HQIC.

2. Origin and Minimal Strong Consistency

The HQIC was originally introduced by Hannan and Quinn (1979) in the context of autoregressive (AR) models, showing that the $2\log\log n$ penalty rate is minimal for strong consistency—no slower-growing penalty (e.g., the constant penalty in AIC) suffices to prevent overfitting as $n$ grows.

For linear regression, strong consistency requires that for each candidate subset $T$ of predictors,

$L(z^n, T) = n \log S(T) + k(T) d_n$

where $S(T)$ is the residual sum of squares for model $T$ and $d_n = 2\log\log n$ ; the selected model is

$\hat{T}_n = \arg\min_{T \subseteq \{1,\dots,m\}} L(z^n, T)$

Suzuki (2010) proved that $d_n = 2\log\log n$ is both sufficient and minimal for strong consistency in linear regression selection, just as in AR model selection (1012.4276). Any penalty growing slower than $2\log\log n$ results in asymptotic overfitting; BIC’s penalty ( $\log n$ ) is heavier and also consistent, but not minimal.

3. Mathematical Properties and Model Selection Behavior

HQIC is a special case within the class of penalized likelihood (or penalized contrast) criteria of the form

$\text{Criterion} = -2\mathcal{L} + \text{Penalty}$

with the penalty proportional to the number of parameters and the slow-growing function $2\ln\ln n$ . This functional form ensures:

Strong Consistency: Correct model selected almost surely as $n \to \infty$ .
Minimal Penalty: Penalty grows slowly enough to avoid unnecessary underfitting (selection of an oversimplified model), but just fast enough to prevent persistent overfitting.
Theoretical Sharpness: Derived using asymptotic probability bounds and the law of the iterated logarithm for fluctuation analysis of likelihood differences between nested models.

In time series, this extends to ARMA, GARCH, APARCH, and other affine causal models, where the HQ penalty generalizes to $2cD_m\log\log n$ , with $c$ a model-dependent constant and $D_m$ the model dimension (2101.04210).

4. Comparison with Other Information Criteria

Criterion	Penalty Term	Consistency	Practical Selection Behavior
AIC	$2k$	No	Overfits as $n \to \infty$
BIC/MDL	$k \ln n$	Yes	Tends to underfit (select too simple models) at small $n$
HQIC	$2k \ln \ln n$	Yes, minimal	Intermediate; avoids both over- and under-fitting asymptotically

AIC does not provide even weak consistency in many contexts and is prone to overfit at large $n$ .
BIC / MDL are consistent, but their penalty can lead to underfitting for moderate sample sizes.
HQIC provides strong consistency with the smallest possible penalty rate, making it theoretically optimal for identification but still efficient for practical $n$ .

In empirical studies for high-frequency financial time series (1212.0479), Hawkes processes (1702.06055), and multivariate factor analysis (2407.19959), HQIC is shown to select the correct model order with a probability approaching one as $n$ grows, offering a more balanced trade-off than AIC or BIC.

5. Practical Applications and Extensions

a. Linear Regression and Variable Selection

HQIC is now established as the minimal strongly consistent information criterion for variable selection in both the AR and the classical linear regression settings. For each model, the residual sum of squares is penalized by $2|T|\log\log n$ , guaranteeing almost sure identification of the true variable subset as sample size grows (1012.4276).

b. Time Series and Affine Causal Models

The HQIC penalty extends to ARMA, GARCH, APARCH, and more general time series processes, including those with non-Gaussian innovations, via a penalized likelihood of the form

$C(m) = -2 L_n(\widehat{\theta}(m)) + 2c D_m \log\log n$

with $c$ calibrated using model or data-driven constants (2101.04210).

c. High-Dimensional and Structured Models

HQIC has been adapted to scenarios with high-dimensional data, structured covariance estimation, and models with incomplete data:

Factor Analysis: HQIC is unified with other information-criterion-based rank estimators, and its consistency rests on explicit “gap” conditions relating signal and noise eigenvalues. For HQIC, the penalty is $2\ln\ln n$ times the number of parameters; selection consistency holds if and only if these gap conditions are met (2407.19959).
Covariance Estimation with Missing Data (Radar Applications): HQIC is generalized for penalized likelihood selection of source number, using EM estimation to accommodate missing observations (2105.03738).
Sparse Signed Graph Learning: In balanced signed graph Laplacian recovery, HQIC selects sparsity regularization parameters by minimizing the penalized negative log-likelihood, facilitating optimal graph recovery with theoretical and empirical support (2506.01826).

d. Empirical Performance

In practical model selection tasks, HQIC’s strong consistency and moderate penalty result in better model recovery than AIC (which tends to overfit) and is often less conservative than BIC (which may underfit with finite data).
HQIC remains highly effective unless the number of parameters becomes large relative to the sample size, or parameter estimability breaks down, at which point additional validation (e.g., cross-validation) may be advisable (1212.0479).

HQIC’s role in modern model selection is closely linked to other information-theoretic and coding-based criteria:

Switch Criterion: Recent developments have shown that the switch criterion in nested model selection behaves asymptotically like HQIC, achieving minimax risk up to a $\log\log n$ factor and strong consistency (1408.5724). Both criteria provide the fastest possible risk-convergent rates allowed by Yang’s impossibility theorem, navigating the trade-off between AIC’s efficiency and BIC’s consistency.
Minimum Message Length (MML): MML87, another information-theoretic criterion, can outperform HQIC in mean squared prediction error and selection frequency in certain ARMA time series contexts, especially in small to moderate samples (2110.03250). This suggests HQIC is a robust default from a consistency perspective but may not always yield the best predictive accuracy when sample sizes are small.

7. Summary Formulae and Application Guidance

HQIC is generally implemented via:

$\text{HQIC} = -2\mathcal{L} + 2k\ln(\ln n)$

For linear regression and ARMA-type models, $k$ is simply the number of model parameters, and $\mathcal{L}$ is the maximized log-likelihood. In high-dimensional, structured, or incomplete-data settings, $k$ should count the effective nonzero parameters, and $\mathcal{L}$ may require imputation or EM estimation.

In practice, the HQIC should be:

Computed for all candidate models of interest.
Used to select the model (or regularization parameter, or rank) with minimal value, subject to model feasibility.
Supplemented by other validation techniques if $n$ is not large relative to $k$ , or if parameter estimation is numerically challenging.

The HQIC occupies a prominent, theoretically justified place in the hierarchy of model selection criteria, striking a principled balance between avoiding overfitting and not sacrificing model flexibility, with a mathematically minimal penalty rate for strong asymptotic consistency across regression, time series, structured models, and high-dimensional applications.