Sparse Deep Additive Model with Interactions

Updated 30 September 2025

The SDAMI framework combines sparse additive modeling and deep neural networks to isolate main effects and interactions in high-dimensional regression.
Its two-stage process leverages marginal screening and structured regularization to achieve both interpretability and high predictive accuracy.
Applications in neuroscience and medical diagnostics demonstrate SDAMI's effectiveness in identifying key variables and complex interactions.

The Sparse Deep Additive Model with Interactions (SDAMI) is a statistical learning framework that combines the interpretability and sparsity of additive models with the representational flexibility of deep neural networks, while explicitly disentangling main effects from interaction effects in high-dimensional regression. SDAMI operates under the principle that relevant interactions leave detectable marginal footprints and deploys a two-stage strategy that uses sparsity-driven variable screening, structured regularization, and modular neural subnetworks to achieve both high predictive accuracy and interpretability across settings with limited samples and large feature sets (Hung et al., 27 Sep 2025).

1. Structural Decomposition and Motivation

SDAMI seeks to address the challenges posed by “small $n$ , large $p$ ” data, where complex nonlinear dependencies must be modeled in a form that remains transparent and sparse. The regression function is decomposed as

$Y_i = \sum_{j \in \mathcal{M}} f_j(X_{ij}) + f(\mathbf{X}_{i,\mathcal{I}}) + \varepsilon_i,$

where $\mathcal{M}$ indexes main effects, $\mathcal{I}$ the set of variables appearing primarily in interactions, and $f_j$ , $f$ are nonlinear component functions. Each selected main effect $j$ is assigned a dedicated subnetwork approximating $f_j(\cdot)$ , while interaction subnetworks are constructed only for those groups $\mathcal{I}$ where data justifies such complexity. This contrasts with conventional deep models, where functional entanglement precludes clear attribution of variable influence.

The architecture is modular: all subnetworks are learned jointly but operate on disjoint low-dimensional projections of input, supporting both scalability and interpretability. The model thus balances expressivity (by allowing nonlinear subnetworks) and parsimony (via sparsity constraints and effect disentanglement).

2. Effect Footprint and Marginal Screening Principle

Central to SDAMI is the effect footprint concept. Even when a variable enters only through interactions, it typically leaves a marginal projection:

$m_k(x) = \mathbb{E}[f(\mathbf{X}_{\mathcal{I}}) \mid X_k = x].$

If $m_k(x)$ is nonconstant, a marginal effect manifests—a property exploited to detect both main effects and interaction-only variables. In SDAMI's first stage, a sparse additive screening (e.g., via SpAM) identifies all variables with a nonzero marginal signal, producing an active set $\widehat{\mathcal{S}}$ that contains both genuine main effects and variables active solely through interactions.

This leverages a key property: higher-order interactions typically project residual signal onto univariate marginals, providing a theoretically justified screen for later refinement.

3. Two-Stage Estimation and Variable Partitioning

SDAMI estimation comprises:

Effect Footprint Screening: Fit a sparse additive model to all univariate components. Variables with nontrivial fitted functions are included in the active set $\widehat{\mathcal{S}}$ .
Partitioning and Regularization: Within $\widehat{\mathcal{S}}$ , variables are partitioned into estimated main effects $\widehat{\mathcal{M}}$ and footprint variables $\widehat{\mathcal{I}}$ (variables that only manifest through interactions). A structured regularization—typically a group lasso with basis expansion—is then applied:

$\min_\theta \frac{1}{n} \sum_{i=1}^n \left[ Y_i - \sum_j \mathrm{NN}^{(j)}(X_{ij}; \theta_j) - \mathrm{NN}^{(\mathcal{I})}(\mathbf{X}_{i,\mathcal{I}}; \theta_{\mathcal{I}}) \right]^2,$

subject to layerwise penalties:

$\|W^{(1)}_{m,j}\|_\infty \leq \kappa_m \|f_j\|,\qquad \|W^{(1)}_{\mathcal{I},j}\|_\infty \leq \kappa_{\mathcal{I}} \|f_{\mathcal{I}}\|.$

Here, $\mathrm{NN}^{(j)}(\cdot)$ denotes a subnetwork for $f_j$ , and $\mathrm{NN}^{(\mathcal{I})}(\cdot)$ the multivariate subnetwork for selected interaction variables. Penalty parameters $\lambda_1, \lambda_2$ controlling main-effect and interaction sparsity are optimized, for example via Mallow’s $C_p$ or cross-validation.

Group lasso-like norm constraints act hierarchically: vanishing $L_2$ (or functional) norms prune irrelevant subnetworks, effecting both sparsity and interpretability.

4. Subnetwork Construction and Modular Regularization

Each main effect and each identified interaction group, if justified by the data, is assigned a neural subnetwork. For variable $j$ :

If $j \in \widehat{\mathcal{M}}$ , construct a univariate subnetwork approximating $f_j(\cdot)$ .
If $j \in \widehat{\mathcal{I}}$ , those involved in non-additive effects, include in the multivariate interaction network.

This additive-modular approach enables each $f_j$ and $f_{\mathcal{I}}$ to be visualized or inspected directly, conferring interpretability unattainable with generic DNNs. Pruning, regularization, and the two-step estimation ensure only functionally relevant subnetworks remain active.

5. Adaptive Regularization and Penalty Selection

Structured regularization is deployed using norm-based constraints and/or group lasso-like penalties. For $L_2$ norm of the $j$ th subnetwork's weights $W_{m,j}$ ,

$\| W_{m,j}^{(1)} \|_\infty \leq \kappa_m \| f_j \|,$

with $\kappa_m$ tuned to balance shrinkage and approximation power. Main-effect and interaction subnetworks are regularized with independent hyperparameters, often optimized by cross-validation or Mallow’s $C_p$ criterion:

$\lambda_1$ : penalizes complexity (or number) of main-effect subnetworks.
$\lambda_2$ : penalizes higher-order interactions.

This structure ensures that only variables or interactions justified by the data are selected.

6. Simulation Studies and Real-World Applications

Extensive simulation studies evaluate scenarios including strong main effects; both main and interaction effects; varying sparsity; and different sample sizes. SDAMI consistently recovers true signal structure with high TPR and low FPR across regimes.

In neuroscience, SDAMI was tested on fMRI data from the visual cortex where tens of thousands of Gabor-filtered features were present. The model identified spatial and orientation variables relevant for main effects, and further revealed meaningful variable combinations in interaction subnetworks, with visualized effect functions providing interpretable neuroscientific insight. In medical diagnostics (e.g., diabetes progression), SDAMI achieved superior prediction performance (MSE, $R^2$ ) compared to classical DNNs and other sparse alternatives, while correctly identifying minimal variable subsets and interactions.

7. Theoretical Underpinnings and Implications

SDAMI methodology is theoretically grounded in the effect footprint property: marginal projections of interaction terms are sufficient for initial variable screening, which is then refined via sparsity-inducing regularization. The approach leverages insights from minimax detection boundaries (Gayraud et al., 2010), adaptive group lasso theory, and hierarchical regularization frameworks, ensuring robustness even in regimes with high ambient dimensionality and negligible main effects.

A plausible implication is that SDAMI architectures provide a principled route to interpretable deep prediction in scientific domains where understanding specific variable effects and their conditional dependencies is as critical as achieving high predictive performance. By enforcing structured sparsity and modularization, SDAMI enables deep models to operate under statistical guarantees typically associated with classical additive models and variable selection approaches, while offering superior function approximation capabilities.

Summary Table: Key Components of SDAMI

Component	Purpose	Methodological Detail
Effect Footprint	Marginal screen for all impactful variables	$m_k(x) = \mathbb{E}[f(\mathbf{X}_{\mathcal{I}}) \mid X_k = x]$
Two-Stage Procedure	Identify and partition active set	Footprint screening $\rightarrow$ Group-lasso refinement
Modular Subnetworks	Isolate and regularize each component	Dedicated NNs per main effect and interaction group
Structured Regularization	Induce and maintain sparsity	Norm/lasso constraints on first-layer weights
Interpretability	Direct effect visualization	Subnetwork $f_j$ / $f_{\mathcal{I}}$ visualizable

In conclusion, Sparse Deep Additive Models with Interactions represent a convergence of deep learning, high-dimensional sparse estimation, and interpretable statistical modeling, offering a scalable and theoretically motivated solution for complex regression problems where both accuracy and transparent variable importance are paramount (Hung et al., 27 Sep 2025).

PDF Markdown Chat (Pro)

References (2)

Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability (2025)

Detection of sparse additive functions (2010)

Follow Topic

Get notified by email when new papers are published related to Sparse Deep Additive Model with Interactions (SDAMI).