Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hannan-Quinn Information Criterion

Updated 30 June 2025
  • Hannan-Quinn Information Criterion (HQIC) is a model selection tool that uses a 2k ln ln n penalty to balance model fit and complexity for strong consistency.
  • Its penalty grows slower than BIC's yet faster than AIC's, achieving the minimal rate needed to prevent overfitting as sample size increases.
  • HQIC is applied across autoregressive models, linear regression, time series, and high-dimensional settings, offering a robust criterion for model selection.

The Hannan-Quinn Information Criterion (HQIC) is a model selection criterion designed to balance model fit and complexity, ensuring consistency in variable and model selection as sample size increases. HQIC was originally developed in the context of autoregressive processes and later rigorously extended to a broad range of statistical models, notably linear regression and time series, and has been further adapted to high-dimensional, structured, and incomplete data settings. Its penalty term is constructed to grow with sample size more slowly than the Bayesian Information Criterion (BIC) but faster than the Akaike Information Criterion (AIC), achieving a theoretically minimal rate for strong consistency in model selection.

1. Theoretical Foundation and General Formulation

HQIC is defined for a statistical model with log-likelihood L\mathcal{L}, kk free parameters, and data sample size nn as

HQIC=2L+2klnlnn\mathrm{HQIC} = -2\mathcal{L} + 2k \ln \ln n

The penalty term 2klnlnn2k \ln \ln n increases with sample size nn, but at the slowest possible rate that guarantees strong consistency in model order or variable selection. Here, strong consistency means that, as nn \to \infty, the probability that the selected model matches the true data-generating model converges to one almost surely.

For model selection, one computes the HQIC for candidate models and selects the model with the lowest HQIC.

2. Origin and Minimal Strong Consistency

The HQIC was originally introduced by Hannan and Quinn (1979) in the context of autoregressive (AR) models, showing that the 2loglogn2\log\log n penalty rate is minimal for strong consistency—no slower-growing penalty (e.g., the constant penalty in AIC) suffices to prevent overfitting as nn grows.

For linear regression, strong consistency requires that for each candidate subset TT of predictors,

L(zn,T)=nlogS(T)+k(T)dnL(z^n, T) = n \log S(T) + k(T) d_n

where S(T)S(T) is the residual sum of squares for model TT and dn=2loglognd_n = 2\log\log n; the selected model is

T^n=argminT{1,,m}L(zn,T)\hat{T}_n = \arg\min_{T \subseteq \{1,\dots,m\}} L(z^n, T)

Suzuki (2010) proved that dn=2loglognd_n = 2\log\log n is both sufficient and minimal for strong consistency in linear regression selection, just as in AR model selection (1012.4276). Any penalty growing slower than 2loglogn2\log\log n results in asymptotic overfitting; BIC’s penalty (logn\log n) is heavier and also consistent, but not minimal.

3. Mathematical Properties and Model Selection Behavior

HQIC is a special case within the class of penalized likelihood (or penalized contrast) criteria of the form

Criterion=2L+Penalty\text{Criterion} = -2\mathcal{L} + \text{Penalty}

with the penalty proportional to the number of parameters and the slow-growing function 2lnlnn2\ln\ln n. This functional form ensures:

  • Strong Consistency: Correct model selected almost surely as nn \to \infty.
  • Minimal Penalty: Penalty grows slowly enough to avoid unnecessary underfitting (selection of an oversimplified model), but just fast enough to prevent persistent overfitting.
  • Theoretical Sharpness: Derived using asymptotic probability bounds and the law of the iterated logarithm for fluctuation analysis of likelihood differences between nested models.

In time series, this extends to ARMA, GARCH, APARCH, and other affine causal models, where the HQ penalty generalizes to 2cDmloglogn2cD_m\log\log n, with cc a model-dependent constant and DmD_m the model dimension (2101.04210).

4. Comparison with Other Information Criteria

Criterion Penalty Term Consistency Practical Selection Behavior
AIC $2k$ No Overfits as nn \to \infty
BIC/MDL klnnk \ln n Yes Tends to underfit (select too simple models) at small nn
HQIC 2klnlnn2k \ln \ln n Yes, minimal Intermediate; avoids both over- and under-fitting asymptotically
  • AIC does not provide even weak consistency in many contexts and is prone to overfit at large nn.
  • BIC / MDL are consistent, but their penalty can lead to underfitting for moderate sample sizes.
  • HQIC provides strong consistency with the smallest possible penalty rate, making it theoretically optimal for identification but still efficient for practical nn.

In empirical studies for high-frequency financial time series (1212.0479), Hawkes processes (1702.06055), and multivariate factor analysis (2407.19959), HQIC is shown to select the correct model order with a probability approaching one as nn grows, offering a more balanced trade-off than AIC or BIC.

5. Practical Applications and Extensions

a. Linear Regression and Variable Selection

HQIC is now established as the minimal strongly consistent information criterion for variable selection in both the AR and the classical linear regression settings. For each model, the residual sum of squares is penalized by 2Tloglogn2|T|\log\log n, guaranteeing almost sure identification of the true variable subset as sample size grows (1012.4276).

b. Time Series and Affine Causal Models

The HQIC penalty extends to ARMA, GARCH, APARCH, and more general time series processes, including those with non-Gaussian innovations, via a penalized likelihood of the form

C(m)=2Ln(θ^(m))+2cDmloglognC(m) = -2 L_n(\widehat{\theta}(m)) + 2c D_m \log\log n

with cc calibrated using model or data-driven constants (2101.04210).

c. High-Dimensional and Structured Models

HQIC has been adapted to scenarios with high-dimensional data, structured covariance estimation, and models with incomplete data:

  • Factor Analysis: HQIC is unified with other information-criterion-based rank estimators, and its consistency rests on explicit “gap” conditions relating signal and noise eigenvalues. For HQIC, the penalty is 2lnlnn2\ln\ln n times the number of parameters; selection consistency holds if and only if these gap conditions are met (2407.19959).
  • Covariance Estimation with Missing Data (Radar Applications): HQIC is generalized for penalized likelihood selection of source number, using EM estimation to accommodate missing observations (2105.03738).
  • Sparse Signed Graph Learning: In balanced signed graph Laplacian recovery, HQIC selects sparsity regularization parameters by minimizing the penalized negative log-likelihood, facilitating optimal graph recovery with theoretical and empirical support (2506.01826).

d. Empirical Performance

  • In practical model selection tasks, HQIC’s strong consistency and moderate penalty result in better model recovery than AIC (which tends to overfit) and is often less conservative than BIC (which may underfit with finite data).
  • HQIC remains highly effective unless the number of parameters becomes large relative to the sample size, or parameter estimability breaks down, at which point additional validation (e.g., cross-validation) may be advisable (1212.0479).

HQIC’s role in modern model selection is closely linked to other information-theoretic and coding-based criteria:

  • Switch Criterion: Recent developments have shown that the switch criterion in nested model selection behaves asymptotically like HQIC, achieving minimax risk up to a loglogn\log\log n factor and strong consistency (1408.5724). Both criteria provide the fastest possible risk-convergent rates allowed by Yang’s impossibility theorem, navigating the trade-off between AIC’s efficiency and BIC’s consistency.
  • Minimum Message Length (MML): MML87, another information-theoretic criterion, can outperform HQIC in mean squared prediction error and selection frequency in certain ARMA time series contexts, especially in small to moderate samples (2110.03250). This suggests HQIC is a robust default from a consistency perspective but may not always yield the best predictive accuracy when sample sizes are small.

7. Summary Formulae and Application Guidance

HQIC is generally implemented via:

HQIC=2L+2kln(lnn)\text{HQIC} = -2\mathcal{L} + 2k\ln(\ln n)

For linear regression and ARMA-type models, kk is simply the number of model parameters, and L\mathcal{L} is the maximized log-likelihood. In high-dimensional, structured, or incomplete-data settings, kk should count the effective nonzero parameters, and L\mathcal{L} may require imputation or EM estimation.

In practice, the HQIC should be:

  • Computed for all candidate models of interest.
  • Used to select the model (or regularization parameter, or rank) with minimal value, subject to model feasibility.
  • Supplemented by other validation techniques if nn is not large relative to kk, or if parameter estimation is numerically challenging.

The HQIC occupies a prominent, theoretically justified place in the hierarchy of model selection criteria, striking a principled balance between avoiding overfitting and not sacrificing model flexibility, with a mathematically minimal penalty rate for strong asymptotic consistency across regression, time series, structured models, and high-dimensional applications.