Papers
Topics
Authors
Recent
Search
2000 character limit reached

Average Marginal Component Effects (AMCEs)

Updated 29 January 2026
  • AMCEs are formal measures that quantify the expected change in outcomes when a feature is altered, averaging over other variables.
  • They are computed by contrasting predictions at different feature levels using causal models, ensuring isolated marginal effects even in complex architectures.
  • AMCEs are pivotal in applications such as conjoint analysis, policy evaluation, and transparent feature attribution in advanced predictive models.

Average Marginal Component Effects (AMCEs) are formal quantities for expressing the expected causal influence of feature manipulations in multivariate predictive and causal models. Originally developed in the context of conjoint analysis and widely adopted across causal inference, AMCEs operationalize the expected change in an outcome when a single feature is perturbed—from one value or "level" to another—averaged over the empirical or structural distribution of all other features. In the structural framework, AMCEs provide a rigorous, population-level measure of direct and indirect effects, foundational for both interpretability and causal analysis, and are central to evaluating feature contributions in complex models, including deep neural networks (Thielmann et al., 11 Apr 2025).

1. Formal Definition and Context

Let f:RJRf: \mathbb{R}^J \to \mathbb{R} denote a model or data-generating function, where x=(xj,xj)\mathbf{x} = (x_j, \mathbf{x}_{-j}) with xjx_j indicating the feature of interest and xj\mathbf{x}_{-j} the remaining components. The marginal feature effect of variable jj at value vv is defined as

E[f(x)xj=v]=f(v,xj)p(xjxj=v)dxj\mathbb{E}\bigl[f(\mathbf{x})\mid x_j = v\bigr] = \int f(v, x_{-j})\, p(x_{-j} \mid x_j = v)\, dx_{-j}

which captures the predictive expectation for setting xjx_j to vv, marginalizing over the conditional distribution of context features.

Within conjoint analysis and causal inference, the AMCE from level aa to bb is formalized as

AMCEj(ab)=EXj[Y(b,Xj)Y(a,Xj)]\mathrm{AMCE}_j(a\to b) = \mathbb{E}_{X_{-j}} \bigl[ Y(b, X_{-j}) - Y(a, X_{-j}) \bigr]

where Y()Y(\cdot) is a potential outcome or structural model, and the expectation is with respect to the empirical or counterfactual distribution of the context (Thielmann et al., 11 Apr 2025). This estimate prescribes the expected shift in predicted (or observed) outcomes from changing xjx_j from aa to bb with all other features sampled as observed.

2. AMCEs in Additive and Deep Models

Classical generalized additive models (GAMs) with link gg are expressed as

g ⁣(E[yx1,,xJ])=β0+j=1Jfj(xj)g\!\left(\mathbb{E}\left[y \mid x_1, \dots, x_J \right]\right) = \beta_0 + \sum_{j=1}^J f_j(x_j)

which yields interpretable, additive effects for each feature. In such models, with centered basis functions (i.e., fjf_j of zero mean), the isolated marginal effect for feature jj at vv is fj(v)f_j(v), and the AMCE between aa and bb is fj(b)fj(a)f_j(b) - f_j(a). This direct interpretability is typically lost in complex, high-capacity models such as deep neural networks, where feature interactions and non-additivity obscure marginal effects (Thielmann et al., 11 Apr 2025).

Recent adaptations, such as the NAMformer—a tabular transformer network with explicit additive paths—restore the identifiability of marginal effects. Each feature xjx_j is encoded via an uncontextualized embedding Ej(xj)E_j(x_j), passed through a shallow MLP fjϵf_j^\epsilon (shape function), with the model structure:

g(E[yx])=β0+j=1Jfjϵ(Ej(xj))+G(Ξ0)g\left(\mathbb{E}[y\mid\mathbf{x}]\right) = \beta_0 + \sum_{j=1}^J f_j^\epsilon(E_j(x_j)) + G(\Xi_0)

Here, G(Ξ0)G(\Xi_0) incorporates context via contextualized embeddings, but fjϵf_j^\epsilon depends solely on xjx_j via its embedding (Thielmann et al., 11 Apr 2025). After mean-centering, fjϵ(Ej(v))f_j^\epsilon(E_j(v)) provides the marginal effect for xj=vx_j = v.

3. Extraction and Computation of AMCEs

After model training, computation of marginal effect curves for feature jj proceeds as follows:

  • Select a grid {v1,,vK}\{v_1, \ldots, v_K\} spanning the support of xjx_j.
  • For each vkv_k, compute the uncontextualized embedding εj=Ej(vk)\varepsilon_j = E_j(v_k), then the shape output fjϵ(εj)f_j^\epsilon(\varepsilon_j).
  • The function vkfjϵ(Ej(vk))v_k \mapsto f_j^\epsilon(E_j(v_k)) describes the estimated marginal effect curve.

The empirical AMCE is calculated as

AMCE^j(ab)=fjϵ(Ej(b))fjϵ(Ej(a))\widehat{\mathrm{AMCE}}_j(a\to b) = f_j^\epsilon(E_j(b)) - f_j^\epsilon(E_j(a))

This mechanism yields AMCEs directly as differences in shape-function outputs, requiring no post-hoc intervention or bespoke estimation procedures (Thielmann et al., 11 Apr 2025). If xjx_j is continuous, finite differences of the shape function approximate the instantaneous marginal effect.

4. Theoretical Guarantees and Identifiability

The identifiability of AMCEs in the NAMformer is ensured algorithmically by employing independent dropout across the per-feature shape networks during training. With dropout masks w{0,1}J+1w \in \{0,1\}^{J+1} randomly masking each fjϵf_j^\epsilon and the context head GG, the risk decomposes as:

R=E(x,y),w[L(y,β0+j=1Jwjfjϵ(xj)+wJ+1G(Ξ0))]R = \mathbb{E}_{(\mathbf{x},y),\,w} \left[ \mathcal{L}\left( y,\beta_0 + \sum_{j=1}^J w_j f_j^\epsilon(x_j) + w_{J+1} G(\Xi_0) \right) \right]

For a convex loss L\mathcal{L}, the following bound guarantees recovery:

Exk[L(β0+fkϵ(xk),E[yxk])]RRothers(1pk)pk2R\mathbb{E}_{x_k} \left[ \mathcal{L} \bigl(\beta_0 + f_k^\epsilon(x_k), \mathbb{E}[y \mid x_k] \bigr) \right] \leq \frac{R - R_{\mathrm{others}} (1-p_k)}{p_k} \leq 2R

where pkp_k is the probability that only fkϵf_k^\epsilon is active. As R0R \to 0 during training, each fkϵf_k^\epsilon converges in population risk to the true conditional mean E[yxk]\mathbb{E}[y \mid x_k], thereby restoring isolated marginal effects (Thielmann et al., 11 Apr 2025).

5. Applications and Methodological Impact

AMCEs furnish an interpretable and theoretically sound measure for evaluating the marginal importance of features in both classical and contemporary high-capacity models. In tabular transformer architectures such as the NAMformer, AMCE computation is intrinsic to the model structure, supporting transparent analysis and causal attribution. This approach circumvents the opacity of general black-box models while preserving competitive predictive performance, effectively bridging two major paradigms in statistical learning: interpretability and accuracy (Thielmann et al., 11 Apr 2025).

AMCEs are particularly pivotal in conjoint analysis, policy evaluation, and any domain requiring principled assessment of intervention effects at the population level. The ability to efficiently extract marginal effect estimates within transformer-based frameworks represents a significant advance for interpretable AI in tabular settings.

6. Connections to Other Marginal Effect Formalisms

AMCEs closely relate to other measures of marginal effects, such as average treatment effects (ATE) in causal inference, partial dependence plots (PDPs) in machine learning, and marginal effect curves in GAMs. In the additive regime, these concepts coincide; in interaction-heavy models, AMCEs provide an average over observed distributions, maintaining the precise interpretability needed for rigorous analysis. The NAMformer directly recovers AMCEs in a manner analogous to classical GAM-based or causal estimators but embedded within a high-performance, context-aware architecture (Thielmann et al., 11 Apr 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Average Marginal Component Effects (AMCEs).