ABC-Parametrization in HMMs & Regression
- ABC-parametrization is defined as two advanced frameworks: one using approximate Bayesian computation for hidden Markov models and another employing abundance-based constraints in regression.
- In HMMs, it replaces intractable likelihood evaluations with kernel-smoothed substitutes, balancing bias and Monte Carlo variance through the tuning parameter epsilon.
- In regression, the method enforces weighted sum-to-zero constraints, ensuring invariant main effect interpretations and improved efficiency compared to classical coding schemes.
The term abc-parametrization denotes two distinct, advanced frameworks in contemporary statistical methodology: (1) the use of Approximate Bayesian Computation (ABC) for static parameter inference in hidden Markov models (HMMs), and (2) the abundance-based constraints (ABC) parametrization for categorical effect modification in linear regression models. Both are unified by the objective of producing computationally feasible, interpretable, and robust parameter estimates under strong modeling or inferential challenges, albeit in different domains. The sections below address each framework, with explicit technical detail and references to foundational papers (Ehrlich et al., 2012, Kowal, 2024).
1. ABC-Parametrization in Hidden Markov Models
The ABC-parametrization for HMMs is motivated by scenarios where the observation (emission) density is intractable but can be sampled from. Traditional maximum likelihood estimation (MLE) requires evaluating , which is prohibitive for complex or simulator-based models. ABC circumvents this by defining an auxiliary-variable HMM: for each latent state, pseudo-observations are generated and compared to the true observation using a kernel , yielding a kernel-smoothed likelihood substitute. The parameter is explicitly the "ABC parameter," regulating the fidelity of the approximation: as , the ABC approximation becomes exact, but smaller typically results in higher Monte Carlo variance in particle weights. The ABC-parametrized likelihood is given by
where is a normalization constant, independent of state and parameter.
Under strong regularity conditions (joint Lipschitz continuity and boundedness for 0, 1, and their gradients), the log-likelihood and gradient computed from the ABC-parameterized model are biased by at most 2, with 3 the sample size (Ehrlich et al., 2012). This facilitates static parameter estimation without direct likelihood evaluations.
2. Sequential Monte Carlo (SMC) Methods under the ABC-Parametrization
The ABC-parameterized HMM is naturally amenable to SMC implementation for both filtering and marginal likelihood computation. At each time 4, for a set of 5 particles:
- Predict latent states using the Markov transition,
- Simulate associated pseudo-observations,
- Assign weights via 6,
- Perform resampling if necessary.
The unbiased SMC estimator of the ABC marginal likelihood is
7
where each factor is the mean kernel weight over the particle set. Second-order Taylor bias correction is often applied to log-likelihood estimators. The computational cost is linear in 8 per time step.
The choice of 9 is pivotal: larger values stabilize weight degeneracy but increase bias, while small 0 yields more precise approximations at the cost of greater Monte Carlo variance (Ehrlich et al., 2012).
3. SPSA-Based Static Parameter Estimation in ABC-HMMs
Simultaneous Perturbation Stochastic Approximation (SPSA) is utilized for MLE or recursive MLE in ABC-HMMs. At each time 1, the gradient of the ABC log-likelihood with respect to 2 is obtained by finite differencing:
- Evaluate log-likelihood increments at 3 using two SMC runs,
- Compute the gradient estimate as 4,
- Update 5.
Step-size sequences 6 follow standard stochastic approximation rules, and 7 are Rademacher vectors. The procedure avoids direct calculation of 8 or its derivatives, making parameter learning practical even when 9 is a black box (Ehrlich et al., 2012).
4. Abundance-Based Constraints (ABC) Parametrization in Categorical Regression
For linear models involving categorical covariates and their interactions with continuous predictors (cat-modified models), the ABC-parametrization (abundance-based constraints) provides an alternative to reference group or sum-to-zero coding (Kowal, 2024). The core principle is to constrain each set of category-specific coefficients (main effects and interactions) to have weighted sums of zero, where weights are the empirical sample proportions of each category. Formally, for categorical variable 0 with levels 1,
2
where 3 is the proportion of observations in category 4.
The ABC-parametrization ensures that main effects are interpreted as abundance-weighted averages across groups and that estimates and standard errors (SEs) of main effects are invariant when categorical modifiers are included, provided group variances are homogeneous. Furthermore, standard errors for main effects are non-increasing when cat-modifiers are added (Kowal, 2024).
5. Comparison with Other Parametrizations in Linear Regression
Classical reference-group encoding (RGE) and sum-to-zero (STZ) constraints suffer from changed interpretations, bias toward reference groups (under regularization), and increased SEs when interaction terms are introduced. In contrast, the ABC approach:
- Removes reference-group arbitrariness,
- Stabilizes main effect estimation under model enrichment with modifiers,
- Maintains or improves the efficiency (lower or equal SEs) of main effects,
- Facilitates transparent interpretation, as main effects represent global averages.
A structured approach involves constructing the design matrix, computing empirical proportions, forming the constraint matrix, applying QR-based reduction to project onto the constraint nullspace, fitting the unconstrained regression, and back-transforming to the original parameter space. ABC-based penalized estimation (e.g., lasso, ridge) proceeds analogously.
| Parametrization | Reference Group Bias | Invariance w/ Interactions | SE Inflation |
|---|---|---|---|
| Reference-group (RGE) | Yes | No | Yes |
| Sum-to-zero (STZ) | No | No | Can increase |
| Abundance-based (ABC) | No | Yes (under homogeneity) | No |
6. Application and Practical Guidelines
Empirical studies confirm the theoretical claims of ABC parametrization. In hidden Markov models, application to both linear-Gaussian and chaotic Lorenz systems demonstrates that bias is dominated by 5, variance by 6, and that with appropriate particle numbers and kernel widths, consistent estimation is routine (Ehrlich et al., 2012). In regression, simulation and real data analyses show that OLS estimates for main effects are preserved under model enrichment (e.g., addition of categorical interactions), while SEs are typically reduced (Kowal, 2024).
Key recommendations include:
- For ABC-HMMs, tune 7 to balance bias and Monte Carlo variance, typically identifying a "sweet spot" where both are acceptable.
- For ABC in regression, center continuous covariates, check for variance/covariance homogeneity across groups, and employ empirical proportions in constraint formation.
7. Theoretical and Methodological Conditions
The performance and interpretation guarantees of ABC-parametrization rely on specific assumptions:
- In HMMs: Lipschitz and boundedness of transition/observation densities and their derivatives for bias control (Ehrlich et al., 2012).
- In regression: centered covariates, variance/covariance homogeneity within groups, absence of perfect collinearity, and classical OLS error assumptions (Kowal, 2024).
Under these conditions, ABC parametrization provides a principled, data-driven mechanism for model specification, estimation, and inference in complex latent or heterogeneous effect settings. It ensures clear, global interpretations of effects, consistent estimation under model extension, and efficient utilization of data for both likelihood-free and regression settings.