Papers
Topics
Authors
Recent
Search
2000 character limit reached

Weighted Average Derivative Effects

Updated 9 December 2025
  • Weighted Average Derivative Effects (WADEs) are defined by integrating the local derivative of a conditional mean function with an arbitrary weighting measure, generalizing ADEs.
  • They facilitate efficient semiparametric estimation and robust causal inference across frameworks like quantile IV and kernel regression.
  • Optimal weighting, Riesz representer techniques, and kernel-based methods enhance WADEs’ practical implementation and policy interpretation.

Weighted average derivative effects (WADEs) generalize the concept of average derivative effects (ADEs) by integrating the local derivative of a structural or conditional mean function against an arbitrary weighting measure, producing a scalar estimand of central interest in nonparametric analysis, semiparametric efficiency, and causal inference. WADEs subsume the ADE as a special case, admit rich causal and policy interpretations, and serve as canonical functionals in quantile IV, kernel regression, and doubly robust estimation frameworks.

1. Formal Definition and General Properties

Let (Y,A,Z)(Y, A, Z) denote observed outcome, continuous exposure or treatment, and covariates, respectively. The conditional mean outcome function is μ(a,z)=E[YA=a,Z=z]\mu(a, z) = E[Y \mid A = a, Z = z], with aμ(a,z)a \mapsto \mu(a, z) differentiable. For a weight function w(a,z)w(a, z) satisfying E{w(A,Z)}0E\{w(A, Z)\} \neq 0, the WADE is

θw=E[w(A,Z)aμ(A,Z)].\theta_w = E[w(A, Z) \, \partial_a \mu(A, Z)].

When ww is normalized such that E{w(A,Z)}=1E\{w(A, Z)\}=1, θw\theta_w is the ordinary average of the local slopes. The special case w(a,z)=1w(a, z) = 1 yields the unweighted ADE, θ=E[aμ(A,Z)]\theta = E[\partial_a \mu(A, Z)]. In conditional mean regression g(x)=E[YX=x]g(x) = E[Y \mid X=x], the WADE associated with w(x)w(x) is θw=w(x)g(x)dx\theta_w = \int w(x) \, g'(x) \, dx (Hines et al., 2023, Cattaneo et al., 2022).

2. Riesz Representer Characterization and Influence Functions

WADEs are bounded linear functionals on the Hilbert space H\mathcal{H} of square-integrable functions of (A,Z)(A, Z), yielding a representation via the Riesz representer αw\alpha_w:

θw=αw,μ=E[αw(A,Z)Y].\theta_w = \langle \alpha_w, \mu \rangle = E[\alpha_w(A, Z) \, Y].

By integration by parts,

αw(a,z)=aw(a,z)w(a,z)af(az)f(az),\alpha_w(a, z) = -\frac{\partial}{\partial a} w(a, z) - w(a, z) \frac{\partial_a f(a \mid z)}{f(a \mid z)},

where f(az)f(a \mid z) denotes the conditional density of AZ=zA \mid Z=z. The double-robust, one-step estimator is then

θ^w=n1i=1n[αw(Ai,Zi){Yiμ^(Ai,Zi)}+w(Ai,Zi)aμ^(Ai,Zi)].\hat\theta_w = n^{-1} \sum_{i=1}^n [\alpha_w(A_i, Z_i) \{ Y_i - \hat\mu(A_i, Z_i) \} + w(A_i, Z_i) \partial_a \hat\mu(A_i, Z_i)].

This approach enables valid inference using cross-fitted or machine-learned nuisance estimators, provided mean-square consistency (Hines et al., 2023, Hines et al., 2021).

3. Semiparametric Efficiency, Optimal Weights, and Variance Bounds

The efficiency bound for WADEs is determined by the variance of the influence function. Denote heteroscedastic variance by σ2(a,z)\sigma^2(a, z). Among all admissible weights, the optimal α\alpha^* minimizing the nonparametric efficiency bound satisfies

α(a,z)=(aπ~(z))/σ2(a,z)E[Aπ~(Z)/σ2(A,Z)A],\alpha^*(a, z) = \frac{(a - \tilde\pi(z))/\sigma^2(a, z)}{E[A-\tilde\pi(Z)/\sigma^2(A, Z) \, A]},

with π~(z)\tilde\pi(z) defined as

π~(z)=E[A/σ2(A,Z)Z=z]E[1/σ2(A,Z)Z=z].\tilde\pi(z) = \frac{E[A/\sigma^2(A, Z) \mid Z=z]}{E[1/\sigma^2(A, Z) \mid Z=z]}.

For homoscedasticity, the optimal WADE simplifies to

Ψ=E{Cov(A,YZ)}E{Var(AZ)},\Psi = \frac{E\{\mathrm{Cov}(A, Y\mid Z)\}}{E\{\mathrm{Var}(A\mid Z)\}},

estimable via sample-split cross-fitting:

Ψ^=i=1n(Aiπ^(Zi))(Yiμ^(Zi))i=1n(Aiπ^(Zi))2.\hat\Psi = \frac{\sum_{i=1}^n (A_i - \hat\pi(Z_i))(Y_i - \hat\mu(Z_i))}{\sum_{i=1}^n (A_i - \hat\pi(Z_i))^2}.

The nonparametric efficiency bound for θ\theta^* is

V=E[α(A,Z)2σ2(A,Z)],V^* = E[\alpha^*(A, Z)^2 \sigma^2(A, Z)],

with consistent variance estimation via plug-in influence-function sample averages (Hines et al., 2023, Hines et al., 2021).

4. WADEs in Nonparametric Quantile IV and Ill-posed Inverse Problems

WADEs characterize key functionals in nonparametric quantile IV regression (NPQIV), which is associated with non-separable, nonlinear, and ill-posed inverse models:

E[1{Yh0(W)}τX]=0a.s.E[1\{ Y \leq h_0(W) \} - \tau \mid X] = 0 \quad \text{a.s.}

θ0=E[μ(W)h0(W)],\theta_0 = E[\mu(W) h_0'(W)],

with μ\mu user-supplied and h0h_0' the weak derivative. Semiparametric efficiency bounds for NPQIV WADEs depend on unknown conditional derivative operators and ill-posedness degree, rendering the information bound singular or non-singular according to Range(T)\ell \in \mathrm{Range}(\mathbf{T}^*). Penalized sieve generalized empirical likelihood (GEL) estimators constructed from unconditional and growing unconditional moment restrictions achieve consistency and, under regular conditions, asymptotic normality (Chen et al., 2019).

5. Kernel-based Density-Weighted ADEs, Bandwidth, and Robust Inference

Density-weighted average derivative (DWAD) estimators operationalize WADEs in kernel regression contexts, with w(x)=f(x)w(x) = f(x) (density weighting):

θ=f(x)g(x)dx=2E[Yf˙(X)],\theta = \int f(x) g'(x) dx = -2 E[Y \dot f(X)],

estimable by symmetric kernel KK and bandwidth hh:

θ^=21n(n1)ijYi1hd+1K˙(XiXjh).\hat\theta = -2 \frac{1}{n(n-1)} \sum_{i\neq j} Y_i \frac{1}{h^{d+1}} \dot K\left(\frac{X_i - X_j}{h}\right).

Classical asymptotically-linear (AL) theory requires restrictive smoothness and bandwidth conditions (nh2P0n h^{2P} \to 0, nhd+2n h^{d+2}\to\infty); inference is invariant to hh. The small-bandwidth (SB) framework relaxes these, allowing hh to shrink faster, produces robust finite-sample performance, and yields Gaussian limits for linear and quadratic terms with debiased variance estimators:

V^SB=n1Σ^(n2)1hd2Δ^.\hat V_{SB} = n^{-1} \hat\Sigma - \binom{n}{2}^{-1} h^{-d-2} \hat\Delta.

SB inference achieves lower coverage error and improved robustness to hh relative to AL (Cattaneo et al., 2022). Edgeworth expansions explicitly characterize bias, skewness, and coverage accuracy for both AL and SB regimes.

6. Causal and Policy Interpretations

WADEs admit causal interpretations as weighted averages of local incremental or “infinitesimal shift” effects, quantifying average outcome response to marginal changes in exposure under observable or stochastic intervention laws. For ADEs (w=1w=1, Q=fQ=f), θ=E[aμ(A,Z)]\theta = E[ \partial_a \mu(A, Z) ] is the average derivative effect for infinitesimal shifts. Under general QQ and ww, the estimand

θ(Q)=xew(e,x)m(e,x)edQ(ex)dPX(x)\theta(Q) = \int_x \int_e w(e, x) \frac{\partial m(e, x)}{\partial e} dQ(e \mid x) dP_X(x)

measures the average slope of m(e,x)m(e, x) under intervention law QQ. The optimal QQ^*—identified via calculus of variations—minimizes the semiparametric variance bound among all such interventions, generalizing Crump overlap-weighted ATE principles to continuous exposures. WADEs thus serve as key estimands for incremental dose-response effects, overlap weighting, and robust causal analyses (Hines et al., 2023, Hines et al., 2021).

7. Practical Implementation and Guidelines

  • Basis Selection: Linear sieves (e.g., B-splines, wavelets, polynomials) for functional parameter approximation; component basis functions for instruments and exposures increase with sample size (Chen et al., 2019).
  • Penalty Tuning: Cross-validation, information criteria, or rule-of-thumb sequences (e.g., γKK2m\gamma_K \propto K^{-2m}) subject to prior smoothness specifications (Chen et al., 2019).
  • Bandwidth in DWAD: Prefer small bandwidth for bias reduction in density-weighted estimation, ensuring n2hdn^2 h^d \to \infty (Cattaneo et al., 2022).
  • Variance Estimation: Use empirical influence-function plug-in estimators for WADE SEs; in kernel settings, apply SB variance estimators for robust coverage (Cattaneo et al., 2022).
  • Cross-fitting: Adopt KK-fold sample splitting for nuisance function estimation in debiased machine-learning WADE estimators, facilitating root-nn consistency and valid inference under weak regularity (Hines et al., 2023, Hines et al., 2021).

WADEs unify average derivative methodologies in economics, causal inference, and semiparametric theory, facilitating efficient estimation, robust inference, and interpretable “local” effects in models with continuous exposures, arbitrary weighting, and ill-posed inverse structures.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Weighted Average Derivative Effects (WADEs).