Neural Additive Models (CRISPNAM-FG)

Updated 4 February 2026

The paper presents CRISPNAM-FG, an innovative model integrating neural networks with generalized additive structures for transparent competing risks analysis.
It achieves competitive performance with TD-AUCs up to 0.986 and employs feature-level attributions for clear, regulatory-compliant predictions.
The method offers practical insights for applications in medicine and finance while ensuring auditability through univariate shape functions and rigorous selection.

Neural Additive Models (CRISPNAM-FG) are a class of intrinsically interpretable deep learning models specifically designed for transparent, high-precision prediction in competing risks survival analysis—most notably by combining the expressivity of neural networks with the additive, feature-decomposable structure of generalized additive models. The CRISPNAM-FG framework and closely related variants (Generalized Groves of Neural Additive Models) provide end-to-end, glass-box risk modeling with strict feature-level attribution, competitive statistical performance, and regulatory-compliant explanations, and have seen adoption in sensitive applications from medicine to finance (Ramachandram et al., 16 Nov 2025, &&&1&&&, Chen et al., 2022).

1. Mathematical Foundations

CRISPNAM-FG models the risk of each competing event using a structured, additive neural architecture. Let $T$ denote the observed time, $E \in \{1, ..., K\}$ the event type among $K$ mutually exclusive risks, and $\mathbf{x} = (x_1, ..., x_p)$ the $p$ -dimensional covariate vector. For each risk $k$ :

$\lambda_k^{\text{FG}}(t \mid \mathbf{x}) = \lambda_{0k}^{\text{FG}}(t) \exp\left( \eta_k(\mathbf{x}) \right), \qquad \eta_k(\mathbf{x}) = \sum_{i=1}^p g_{i,k}\bigl( f_i(x_i) \bigr),$

where each $f_i : \mathbb{R} \rightarrow \mathbb{R}^d$ is a compact feedforward neural network (FeatureNet), and $g_{i,k}$ projects the feature embedding into a scalar hazard contribution via a (potentially normalized) vector $\mathbf{w}_{i,k}$ :

$g_{i,k}( f_i(x_i) ) = \frac{\mathbf{w}_{i,k}^\top f_i(x_i)}{\| \mathbf{w}_{i,k} \|_2 + \epsilon}.$

This structure yields univariate "shape functions" $s_{i,k}(x_i) := g_{i,k}(f_i(x_i))$ , so that $\eta_k(\mathbf{x})$ is a sum over all features' contributions for event $k$ . The additive nature ensures feature-level interpretability.

The Cumulative Incidence Function (CIF) for cause $k$ under Fine–Gray is:

$\widehat F_k(t \mid \mathbf{x}) = 1 - \big[ 1 - \widehat F_{0k}(t) \big]^{\exp(\eta_k(\mathbf{x}))}$

with a Breslow-type baseline estimator:

$\widehat F_{0k}(t) = \sum_{i: T_i \leq t, E_i = k} \frac{d_i}{\sum_{j \in \mathcal{R}_k^{\text{sub}}(T_i)} \exp(\eta_k(\mathbf{x}_j))}$

where $\mathcal{R}_k^{\text{sub}}(t)$ is the Fine–Gray risk set, appropriately retaining subjects who failed from competing causes prior to $t$ (Ramachandram et al., 16 Nov 2025).

2. Model Architecture and Training

The architecture consists of individual FeatureNets for each scalar feature, each producing an embedding $\mathbf{h}_i = f_i(x_i) \in \mathbb{R}^d$ . For each risk $k$ , a learnable projection $\mathbf{w}_{i,k}$ maps this embedding to a scalar, and the log-risk for each cause is the sum of all features' projections (Ramachandram et al., 27 May 2025):

FeatureNets: $L=1\!-\!3$ hidden layers (width 32–256), $\tanh$ activations, dropout, feature dropout, optional batch normalization, $L_2$ regularization.
Risk-specific projections: Each $(i, k)$ pair gets its own projection vector, potentially $L_2$ -normalized.
Additive aggregation: $\eta_k(\mathbf{x}) = \sum_i \mathbf{w}_{i,k}^\top f_i(x_i)$ .

Training is via minimization of the Fine–Gray (or Cox-type) partial-likelihood:

$\mathcal{L}_k^{\text{FG}} = -\sum_{i: E_i = k}\left[ \eta_k(\mathbf{x}_i) - \log \sum_{j \in \mathcal{R}_k^{\text{sub}}(T_i)} w_j(T_i) \exp(\eta_k(\mathbf{x}_j)) \right]$

with class weights $\omega_k$ for balancing rare event types, and a regularization term $\gamma\|\Theta\|_2^2$ . Optimization uses AdamW, a batch size of 256, and early stopping; hyperparameters are optimized via Optuna (Ramachandram et al., 16 Nov 2025, Ramachandram et al., 27 May 2025).

3. Feature Selection and Interaction Testing

The CRISPNAM-FG methodology is extendable with automated feature selection and interaction grouping in the style of Generalized Groves of Neural Additive Models (GGNAM) (Chen et al., 2022). The covariates are partitioned into:

Linear features ( $L$ ): enter through coefficients $\beta_j x_j$
Singularly nonlinear features ( $N$ ): enter via univariate neural nets $g_k(x_k)$
Interaction groups ( $I$ ): small groups modeled by joint low-dimensional neural networks $h_v(x_v)$

Feature assignment to these categories is determined by a forward stepwise selection process. For every candidate, the loss improvement obtained by moving a variable from linear to nonlinear is evaluated. Only those variables meaningfully increasing nonlinearity (measured by a threshold $\epsilon$ ) are upgraded.

Once nonlinear features are identified, pairwise additive separability tests—implemented as held-out accuracy comparisons between full and blocked GAM structures—are deployed to partition features into locally interacting groups. Each interaction group is modeled with a small neural network, maintaining interpretability by restricting to low-dimensional interactions.

4. Interpretability Mechanisms

Interpretability is ensured by the additive model structure:

Shape Functions: For each feature and risk, the function $s_{i,k}(x_i)$ quantifies the marginal, possibly nonlinear effect as a one-dimensional curve. These can be plotted directly against the empirical feature distribution.
Feature Importance: The overall importance of feature $i$ for risk $k$ is summarized by the mean absolute contribution:

$\mathcal{I}_{i,k} = \frac{1}{n}\sum_{j=1}^n | s_{i,k}(x_{ij}) |$

Bar plots or rankings provide audit trails within regulatory or clinical interpretation pipelines.

Auditability: For any individual prediction, the total log-risk $\eta_k(\mathbf{x})$ is decomposable into constituent contributions, allowing identification of risk-driving features.

In GGNAM-style architectures, only one- or two-dimensional components are ever used, ensuring that all interaction effects can be visualized as surfaces or heatmaps. If a variable does not meaningfully interact, additive separability tests ensure it is isolated as a univariate effect.

5. Empirical Evaluation and Performance

CRISPNAM-FG demonstrates robust empirical performance across survival and tabular classification/regression regimes:

Clinical survival benchmarks: On datasets such as Framingham Heart Study, SUPPORT2, and PBC, CRISPNAM-FG achieves TD-AUCs from 0.832 to 0.986 across risks, with Brier scores ranging from 0.092 to 0.307. Discrimination outperforms full neural Fine–Gray and generally matches or exceeds black-box methods such as DEEPHIT, with calibration (Brier) within 0.01–0.02 of state-of-the-art (Ramachandram et al., 16 Nov 2025, Ramachandram et al., 27 May 2025).
Finance tabular tasks (GGNAM): On Taiwan Credit and Polish Bankruptcy, GGNAM outperforms both classical logistic regression and single-feature NAMs, reaching AUCs up to 0.907, and discovers critical interactions (e.g., between specific pairs of financial ratios) (Chen et al., 2022).
Large real-world cohort (GEMINI, diabetic foot): On 107,836 patient records, CRISPNAM-FG identifies clinically meaningful predictors (e.g., comorbidities, HbA1c, age effects with non-monotonicities due to competing risk), achieving TD-AUC 0.763 on the primary complication and outperforming other deep survival models (Ramachandram et al., 16 Nov 2025).

Stability across cross-validation folds is generally higher for CRISPNAM-FG than for interaction-rich black-box neural models, reflecting more robust generalization. In financial and regulated domains, the full auditability and simple diagnostics of each term confer an additional practical advantage.

6. Extensions, Limitations, and Ongoing Developments

The inherent additivity of CRISPNAM-FG promotes transparency but precludes capturing high-order or sharp interaction effects present in unconstrained networks. Potential extensions include:

Feature-graph extensions: Allowing FeatureNets to exchange information according to a known network, enabling the modeling of structured dependencies (e.g., physiological or genotypic networks) while preserving visibility of shape functions (Ramachandram et al., 27 May 2025).
Strict monotonicity/fairness: Incorporating architectural monotonicity penalties enforces fairness and regulatory compliance, especially for credit scoring and high-stakes predictions (Chen et al., 2022).
Structured, distributional, or multi-modal input: CRISPNAM-FG can be contrasted with orthogonalization-based frameworks (e.g., SSDR (Rügamer et al., 2020)), which guarantee identifiability when deep nets and structured effects overlap on variables.

Computational complexity is moderate: pairwise interaction or separability tests can become demanding for $p \gg 50$ , suggesting possible algorithmic improvements via Hessian- or group-based proxies (Chen et al., 2022).

7. Theoretical Guarantees and Feature Selection

Variants of CRISPNAM-FG—particularly those with group-sparse penalties (SNAM) (Xu et al., 2022) or stepwise selection (GGNAM) (Chen et al., 2022)—possess rigorously established statistical guarantees:

Consistency: Estimation error of each component vanishes as $n\to\infty$ , under group-LASSO regularization and feature independence.
Exact support recovery: Under suitable mutual incoherence and regularization, SNAM and GGNAM recover the true support—identifying active features with probability tending to 1.
Identifiability of individual effect functions: Provided features are mutually independent, the estimated shape for each feature converges to its true effect up to a constant.
Optimization convergence: Proximal and subgradient descent methods are globally convergent for convex losses, and empirical results support practical effectiveness for non-convex neural subnets.

These results ensure that CRISPNAM-FG, and its regulated/sparse generalizations, deliver both statistical efficiency and interpretational clarity at scale.

Key References: (Ramachandram et al., 27 May 2025, Ramachandram et al., 16 Nov 2025, Chen et al., 2022, Chen et al., 2022, Agarwal et al., 2020, Xu et al., 2022, Rügamer et al., 2020).