Concurvity Regularization in Additive Models
- Concurvity regularization is a differentiable constraint that mitigates non-linear dependencies between feature transformations in additive models.
- It penalizes pairwise Pearson correlations among transformed features to enforce near-orthogonality and improve interpretability.
- Integrated into models like GAMs and NAMs, the method has been empirically shown to reduce attribution ambiguity and enhance feature stability.
Concurvity regularization refers to a class of differentiable constraints that act to reduce non-linear dependencies—termed "concurvity"—between the transformed feature representations in generalized additive models (GAMs) and, by extension, in any differentiable additive model. Concurvity is the non-linear generalization of multicollinearity, occurring when a nontrivial non-linear combination of features or their shape functions yields zero, thereby undermining the interpretability of the model and the stability of feature attributions. The principal approach to concurvity regularization is to penalize pairwise correlations among these non-linear feature mappings, thus promoting near-orthogonality in the transformed feature space and yielding interpretable, stable model decompositions that do not suffer from ambiguity or self-cancellation in feature contributions (Siems et al., 2023).
1. Definition and Manifestation of Concurvity in Additive Models
Formally, a standard generalized additive model (GAM) predicts via the expression
where denotes features, are the univariate "shape" functions, and is a link function. In this setting, concurvity is present if there exist functions in the admissible function class and a constant such that
with not all being constant zero.
Concurvity, unlike linear multicollinearity, encompasses all (possibly non-linear) dependencies that allow transformed features to “cancel out” in the aggregate model sum. In practical applications, strong concurvity can manifest as ambiguities in the attribution of importance to features, high variance in shape function estimation, and diminished interpretability due to self-canceling contributions.
2. Mathematical Framework for Concurvity Regularization
Concurvity regularization is grounded in enforcing empirical orthogonality (zero Pearson correlation) among all pairs of transformed feature vectors , 0, 1. The empirical Pearson correlation is defined as
2
where 3 is the mean of vector 4. The regularizer itself is the average of the absolute pairwise correlations:
5
By construction, 6 if and only if all pairs of transformed features are empirically uncorrelated. Enforcing this property prohibits nontrivial zero-sum non-linear combinations, conferring robustness against concurvity except for the trivial solution where all 7 are constants (Siems et al., 2023).
3. Integration into Differentiable Additive Modeling Workflows
The concurvity regularizer is integrated additively into the standard loss function for differentiable additive models, yielding the following learning objective:
8
where 9 controls the trade-off between empirical risk minimization and orthogonality enforcement. In practice, 0 may be parametrized as differentiable neural blocks (e.g., MLPs in Neural Additive Models, NAMs, or Fourier-series blocks in NeuralProphet). The regularizer 1 is evaluated on each minibatch, allowing for stochastic but unbiased gradient-based optimization. The loss is propagated and optimized using standard frameworks (e.g., PyTorch, JAX) with explicit care for numerical stability in correlation computation.
Pseudocode Sketch:
This approach is agnostic to whether data are tabular or temporal, since seasonality and trend blocks in time series models are also treated as univariate shape functions.
4. Theoretical Properties and Empirical Validation
The regularizer is theoretically justified: enforcing pairwise orthogonality between non-linear feature transforms guarantees—other than the degenerate case of all-constant transforms—the absence of nontrivial concurvity. In empirical studies, several benchmark tasks illustrate both the pathology of concurvity and the rectification via regularization:
- In settings where 2, unregularized NAMs split the influence arbitrarily; regularization drives one feature to capture the effect, rendering feature attribution unique.
- In the presence of deterministic non-linear dependency (3), concurvity is eliminated, and feature attributions become decorrelated without loss of target fit for moderate regularization strength.
- In time series decomposition (e.g., NeuralProphet with daily/weekly components), concurvity regularization yields interpretable, non-overlapping shape functions corresponding to true underlying periodicities, unlike overparametrized or unconstrained models with self-canceling or ambiguous seasonal patterns.
- Across multiple real tabular datasets, a moderate value of 4 reduces concurvity 5 by more than an order of magnitude, with only a minor increase (few-percent) in RMSE or AUC.
Table: Empirical Effects of Concurvity Regularization (Siems et al., 2023)
| Setting | Without Regularization | With Concurvity Regularization |
|---|---|---|
| Perfectly correlated features | Ambiguous splits; high correlation | Unique feature attribution; low 6 |
| Non-linear dependencies | Self-canceling shape functions | Decorrelation; interpretable sum |
| Real data (California Housing) | High variance in shape estimation | Stable shape plots; low variance |
The stability of feature importance, as measured by variance across random initializations, is markedly improved once regularization is applied.
5. Applications and Context within Explainable Machine Learning
Concurvity regularization directly addresses one of the central interpretability challenges in additive modeling. By decorrelating the contributions of non-linear transformations, it provides more faithful feature importance estimates and eliminates self-cancellation phenomena. The technique generalizes across a variety of model classes—including classical GAMs, NAMs, and NeuralProphet—and is particularly impactful in domains (such as structured tabular or time series modeling) where feature dependencies are prevalent and model interpretability is paramount.
A plausible implication is that, as additive models continue to be deployed in domains demanding transparency (e.g., scientific and high-stakes decision-making), concurvity regularization will be indispensable for robust, interpretable inference in the presence of feature dependencies.
6. Summary and Limitations
Concurvity regularization is a conceptually simple, differentiable penalty on pairwise transformed-feature correlations. It can be integrated into any differentiable additive model estimator. The method significantly reduces non-linear dependencies among feature contributions and yields stable, interpretable decompositions with a mild trade-off in predictive accuracy when appropriately tuned. While enforcing exact orthogonality is too restrictive, the soft penalty provides a practical and effective compromise for modern gradient-based optimization frameworks. No substantial computational overhead is introduced, as correlations are efficiently estimated on minibatches (Siems et al., 2023).