Average Marginal Component Effects (AMCEs)

Updated 29 January 2026

AMCEs are formal measures that quantify the expected change in outcomes when a feature is altered, averaging over other variables.
They are computed by contrasting predictions at different feature levels using causal models, ensuring isolated marginal effects even in complex architectures.
AMCEs are pivotal in applications such as conjoint analysis, policy evaluation, and transparent feature attribution in advanced predictive models.

Average Marginal Component Effects (AMCEs) are formal quantities for expressing the expected causal influence of feature manipulations in multivariate predictive and causal models. Originally developed in the context of conjoint analysis and widely adopted across causal inference, AMCEs operationalize the expected change in an outcome when a single feature is perturbed—from one value or "level" to another—averaged over the empirical or structural distribution of all other features. In the structural framework, AMCEs provide a rigorous, population-level measure of direct and indirect effects, foundational for both interpretability and causal analysis, and are central to evaluating feature contributions in complex models, including deep neural networks (Thielmann et al., 11 Apr 2025).

1. Formal Definition and Context

Let $f: \mathbb{R}^J \to \mathbb{R}$ denote a model or data-generating function, where $\mathbf{x} = (x_j, \mathbf{x}_{-j})$ with $x_j$ indicating the feature of interest and $\mathbf{x}_{-j}$ the remaining components. The marginal feature effect of variable $j$ at value $v$ is defined as

$\mathbb{E}\bigl[f(\mathbf{x})\mid x_j = v\bigr] = \int f(v, x_{-j})\, p(x_{-j} \mid x_j = v)\, dx_{-j}$

which captures the predictive expectation for setting $x_j$ to $v$ , marginalizing over the conditional distribution of context features.

Within conjoint analysis and causal inference, the AMCE from level $a$ to $b$ is formalized as

$\mathrm{AMCE}_j(a\to b) = \mathbb{E}_{X_{-j}} \bigl[ Y(b, X_{-j}) - Y(a, X_{-j}) \bigr]$

where $Y(\cdot)$ is a potential outcome or structural model, and the expectation is with respect to the empirical or counterfactual distribution of the context (Thielmann et al., 11 Apr 2025). This estimate prescribes the expected shift in predicted (or observed) outcomes from changing $x_j$ from $a$ to $b$ with all other features sampled as observed.

2. AMCEs in Additive and Deep Models

Classical generalized additive models (GAMs) with link $g$ are expressed as

$g\!\left(\mathbb{E}\left[y \mid x_1, \dots, x_J \right]\right) = \beta_0 + \sum_{j=1}^J f_j(x_j)$

which yields interpretable, additive effects for each feature. In such models, with centered basis functions (i.e., $f_j$ of zero mean), the isolated marginal effect for feature $j$ at $v$ is $f_j(v)$ , and the AMCE between $a$ and $b$ is $f_j(b) - f_j(a)$ . This direct interpretability is typically lost in complex, high-capacity models such as deep neural networks, where feature interactions and non-additivity obscure marginal effects (Thielmann et al., 11 Apr 2025).

Recent adaptations, such as the NAMformer—a tabular transformer network with explicit additive paths—restore the identifiability of marginal effects. Each feature $x_j$ is encoded via an uncontextualized embedding $E_j(x_j)$ , passed through a shallow MLP $f_j^\epsilon$ (shape function), with the model structure:

$g\left(\mathbb{E}[y\mid\mathbf{x}]\right) = \beta_0 + \sum_{j=1}^J f_j^\epsilon(E_j(x_j)) + G(\Xi_0)$

Here, $G(\Xi_0)$ incorporates context via contextualized embeddings, but $f_j^\epsilon$ depends solely on $x_j$ via its embedding (Thielmann et al., 11 Apr 2025). After mean-centering, $f_j^\epsilon(E_j(v))$ provides the marginal effect for $x_j = v$ .

3. Extraction and Computation of AMCEs

After model training, computation of marginal effect curves for feature $j$ proceeds as follows:

Select a grid $\{v_1, \ldots, v_K\}$ spanning the support of $x_j$ .
For each $v_k$ , compute the uncontextualized embedding $\varepsilon_j = E_j(v_k)$ , then the shape output $f_j^\epsilon(\varepsilon_j)$ .
The function $v_k \mapsto f_j^\epsilon(E_j(v_k))$ describes the estimated marginal effect curve.

The empirical AMCE is calculated as

$\widehat{\mathrm{AMCE}}_j(a\to b) = f_j^\epsilon(E_j(b)) - f_j^\epsilon(E_j(a))$

This mechanism yields AMCEs directly as differences in shape-function outputs, requiring no post-hoc intervention or bespoke estimation procedures (Thielmann et al., 11 Apr 2025). If $x_j$ is continuous, finite differences of the shape function approximate the instantaneous marginal effect.

4. Theoretical Guarantees and Identifiability

The identifiability of AMCEs in the NAMformer is ensured algorithmically by employing independent dropout across the per-feature shape networks during training. With dropout masks $w \in \{0,1\}^{J+1}$ randomly masking each $f_j^\epsilon$ and the context head $G$ , the risk decomposes as:

$R = \mathbb{E}_{(\mathbf{x},y),\,w} \left[ \mathcal{L}\left( y,\beta_0 + \sum_{j=1}^J w_j f_j^\epsilon(x_j) + w_{J+1} G(\Xi_0) \right) \right]$

For a convex loss $\mathcal{L}$ , the following bound guarantees recovery:

$\mathbb{E}_{x_k} \left[ \mathcal{L} \bigl(\beta_0 + f_k^\epsilon(x_k), \mathbb{E}[y \mid x_k] \bigr) \right] \leq \frac{R - R_{\mathrm{others}} (1-p_k)}{p_k} \leq 2R$

where $p_k$ is the probability that only $f_k^\epsilon$ is active. As $R \to 0$ during training, each $f_k^\epsilon$ converges in population risk to the true conditional mean $\mathbb{E}[y \mid x_k]$ , thereby restoring isolated marginal effects (Thielmann et al., 11 Apr 2025).

5. Applications and Methodological Impact

AMCEs furnish an interpretable and theoretically sound measure for evaluating the marginal importance of features in both classical and contemporary high-capacity models. In tabular transformer architectures such as the NAMformer, AMCE computation is intrinsic to the model structure, supporting transparent analysis and causal attribution. This approach circumvents the opacity of general black-box models while preserving competitive predictive performance, effectively bridging two major paradigms in statistical learning: interpretability and accuracy (Thielmann et al., 11 Apr 2025).

AMCEs are particularly pivotal in conjoint analysis, policy evaluation, and any domain requiring principled assessment of intervention effects at the population level. The ability to efficiently extract marginal effect estimates within transformer-based frameworks represents a significant advance for interpretable AI in tabular settings.

6. Connections to Other Marginal Effect Formalisms

AMCEs closely relate to other measures of marginal effects, such as average treatment effects (ATE) in causal inference, partial dependence plots (PDPs) in machine learning, and marginal effect curves in GAMs. In the additive regime, these concepts coincide; in interaction-heavy models, AMCEs provide an average over observed distributions, maintaining the precise interpretability needed for rigorous analysis. The NAMformer directly recovers AMCEs in a manner analogous to classical GAM-based or causal estimators but embedded within a high-performance, context-aware architecture (Thielmann et al., 11 Apr 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Beyond Black-Box Predictions: Identifying Marginal Feature Effects in Tabular Transformer Networks (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Average Marginal Component Effects (AMCEs).