Counterfactual Density Effects and the German East--West Income Gap
Abstract: We propose a novel framework for conducting causal inference based on counterfactual densities. While the current paradigm of causal inference is mostly focused on estimating average treatment effects (ATEs), which restricts the analysis to the first moment of the outcome variable, our density-based approach is able to detect causal effects based on general distributional characteristics. Following the Oaxaca-Blinder decomposition approach, we consider two types of counterfactual density effects that together explain observed discrepancies between the densities of the treated and control group. First, the distribution effect is the counterfactual effect of changing the conditional density of the control group to that of the treatment group, while keeping the covariates fixed at the treatment group distribution. Second, the covariate effect represents the effect of a hypothetical change in the covariate distribution. Both effects have a causal interpretation under the classical unconfoundedness and overlap assumptions. Methodologically, our approach is based on analyzing the conditional densities as elements of a Bayes Hilbert space, which preserves the non-negativity and integration-to-one constraints. We specify a flexible functional additive regression model estimating the conditional densities. We apply our method to analyze the German East--West income gap, i.e., the observed differences in wages between East Germans and West Germans. While most of the existing studies focus on the average differences and neglect other distributional characteristics, our density-based approach is suited to detect all nuances of the counterfactual distributions, including differences in probability masses at zero.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What this paper is about
This paper looks for a better way to compare two groups—like people from East and West Germany—by studying the whole shape of their income distributions, not just their average income. Instead of asking “How different are their averages?”, the authors ask “How do their entire income patterns differ across all pay levels, including zero?” They introduce a new, more detailed method to make “what-if” comparisons (counterfactuals) and apply it to the East–West income gap in Germany.
The main questions the paper asks
In simple terms, the paper asks:
- How can we tell whether differences between two groups come from who they are (their characteristics like age, education, industry) versus how those characteristics translate into outcomes (the “rules” or conditions that turn characteristics into pay)?
- Can we do this not just for the average but for every part of the income distribution (low earners, middle, high earners, and even people with zero income)?
- When we compare East and West Germans, how much of the income gap is due to:
- Covariates: different mixes of people (for example, different education levels)?
- Distributional factors: different “mapping” from characteristics to pay (for example, different pay-setting or labor market conditions)?
- Do these differences look the same for men and women?
How the authors approach the problem (explained simply)
Think of an income distribution like the shape of a pile of sand spread across a table showing how many people earn each pay level. A “counterfactual” is a controlled “what-if” world. For example: “What would East Germans’ income shape look like if they had West Germans’ mix of characteristics?” or “What would it look like if we kept East Germans’ characteristics but switched to West-style pay-setting?”
To break this down, the authors separate two effects:
- Covariate effect: What changes if we swap the groups’ characteristics (like age, education, industry shares) but keep the way pay is determined the same? Think of this as changing the ingredients in a recipe.
- Distribution effect: What changes if we keep characteristics the same but swap how those characteristics turn into pay? Think of this as changing the oven or baking rules—same ingredients, different process.
Instead of subtracting two shapes (which can be misleading where very few people lie), they compare them using ratios (how many times bigger or smaller). Ratios work better even in “thin” parts of the distribution, like the very low or very high ends.
How they estimate the shapes:
- They use a flexible statistical model to learn the entire “shape” of incomes given people’s characteristics.
- They build the shape from many small pieces (like Lego blocks or smooth curves) so it can bend and fit the data, including tricky cases like a big spike at zero income.
- A special mathematical framework (called a Bayes Hilbert space) ensures the estimated shapes are always valid densities (they’re never negative and they always add up to 100%).
- For speed and stability, they approximate the problem by grouping incomes into small bins and using a counting model (similar to how you’d count how many observations land in each bin).
They also:
- Check their method on simulated (fake) data to see if it recovers the right shapes.
- Apply it to real data on German incomes.
Two important assumptions they rely on (in kid-friendly terms):
- We’ve measured the key factors that affect both whether someone is from East/West and how much they earn (no big hidden biases).
- The two groups are comparable enough that we can imagine swapping characteristics between them (“enough overlap”).
What they find and why it matters
Here are the main takeaways from their application to the German East–West income gap:
- The gap has narrowed since reunification, but it hasn’t disappeared.
- Most of the remaining differences come from the distribution effect—how characteristics are turned into pay—rather than from different mixes of characteristics. In other words, it’s more about the “baking rules” than the “ingredients.”
- These differences are stronger for men than for women, suggesting the gap is, to a large extent, a male-specific issue.
- Their method captures important features that average-focused tools miss—like bimodality (two peaks) and a big pile at zero income—and shows exactly where in the distribution differences are largest (for example, at low pay levels).
From the simulations:
- Their approach estimates the whole distribution reliably, especially as sample size grows.
- It performs at least as well as, and often better than, common alternatives in estimating the detailed shapes.
- A big advantage is that it naturally handles mixed outcomes—like a spike at zero income plus a smooth spread of positive incomes—without special hacks.
Why this research is useful
- For policy: If the gap is mostly about how pay is determined (the “rules”), then policies should focus on wage-setting, labor market institutions, or workplace practices, not just on changing who is in which group (like education shares).
- For fairness: By showing where in the distribution the gap is biggest (for example, among low-wage men), policymakers can target help more precisely.
- For science: This density-based approach looks at the whole picture, not just the average or a handful of percentiles, and works even when many people have zero income. That makes it useful for studying many real-world problems where outcomes aren’t smooth or simple.
In short, the paper offers a clearer, more complete way to compare groups across the entire distribution, shows how to separate “who people are” from “how the system rewards them,” and applies these ideas to shed new light on the East–West income gap in Germany.
Knowledge Gaps
Unresolved gaps and open questions
Below is a single, consolidated list of what remains missing, uncertain, or unexplored in the paper, phrased to be actionable for future research:
- Identification and causal assumptions
- No sensitivity analysis to violations of unconfoundedness; develop formal diagnostics and robust/sensitivity procedures (e.g., Rosenbaum bounds, bias functions, or partial identification) for density-based effects.
- Overlap/positivity is assumed but not diagnosed; provide practical overlap checks for high-dimensional covariates and propose trimming or stabilized weighting strategies suited to density ratios.
- Treatment-induced (post-treatment) covariates may be included in the decomposition; clarify identification conditions and provide mediator-aware decompositions to avoid conflating mediation with composition.
- External validity and transportability of counterfactual densities are not discussed; establish conditions and methods for transporting density effects to new populations.
- Formal properties of the proposed estimands
- Asymptotic theory is provided for the conditional density parameters, but not for the derived counterfactual density effects TE(y), DE(y), CE(y); derive consistency, convergence rates, and functional CLTs for these ratio-valued functionals.
- Inference for TE(y), DE(y), CE(y) relies on Wald-region draws for θ; develop valid pointwise and uniform confidence bands for y ∈ support, with control of simultaneous error rates and multiple comparisons across y.
- Stability of ratios in low-density regions is not characterized; analyze bias–variance tradeoffs, define safe regions of support, and propose stabilized estimators (e.g., ridge on log-densities, truncation, or density-ratio regularization).
- Decomposition into covariate contributions
- Proposed CE_j(y) and DE_j(y) do not aggregate to the total effect and have no formal identification or inferential theory; provide conditions under which interpretations are valid and develop confidence bands for these partial effects.
- Impact of interactions and dependence among covariates on CE_j(y), DE_j(y) is not analyzed; clarify interpretability under interactions and propose alternative “Shapley-like” or game-theoretic decompositions adapted to Bayes Hilbert spaces.
- Path-dependence is avoided, but potential dependence on the choice of reference group/distribution remains; study alternative reference choices and their effect on CE_j(y), DE_j(y).
- Estimation and tuning choices
- Dependence on histogram binning for the Poisson approximation is not examined; provide guidance on bin number/width selection, analyze induced discretization bias, and study convergence as bin width shrinks in finite samples.
- Basis design and penalty tuning (for both covariates and outcome domain) are critical but lack principled selection rules; develop data-driven methods (e.g., cross-validation, information criteria) tailored to Bayes Hilbert regression.
- Model selection and variable selection for high-dimensional X are not addressed; design sparsity-inducing penalties, screening rules, or group lasso variants compatible with the Bayes space structure.
- Computational scalability with many covariates, interactions, and large n is not studied; quantify complexity, provide algorithmic accelerations, and benchmark runtimes/memory against alternatives.
- Robustness and double robustness
- The estimator depends solely on conditional density modeling; investigate doubly robust or multiply robust strategies for counterfactual densities that combine outcome-density modeling with weighting/propensity modeling.
- Misspecification robustness is not characterized; derive bounds or bias diagnostics when the additive Bayes Hilbert model is misspecified, especially for mixed discrete–continuous outcomes.
- Comparisons and benchmarking
- Simulation benchmarks use basic kernel estimators and omit modern conditional density methods (e.g., conditional normalizing flows, mixture-of-experts, copulas); conduct head-to-head comparisons on accuracy, stability, and computation.
- Simulations do not cover mixed-type outcomes (with point masses) or continuous covariates with moderate/high dimension; extend simulations to zero-inflated outcomes, heavy tails, bimodality, and overlap violations to reflect the application.
- No evaluation of coverage properties for the proposed uncertainty quantification; simulate to assess empirical coverage and power of pointwise and uniform bands for TE(y), DE(y), CE(y), CE_j(y), DE_j(y).
- Extensions of the treatment setting
- The framework is stated as extendable to K>2 groups but is not developed; provide explicit formulas, identification, and inference for multi-group and multi-arm settings.
- Continuous or ordinal treatments are not addressed; extend to continuous exposures (e.g., dose–response) with density-based effects and provide identification/estimation strategies.
- Interference and spillovers are not considered; outline how density effects might be defined/identified under network or cluster interference.
- Practical applicability and diagnostics
- Guidance on diagnosing and mitigating support/extrapolation problems when integrating f(y|x) over F_X of another group is absent; provide tools for detecting off-support integration and remedies (e.g., overlap weights, covariate trimming).
- No out-of-sample validation or calibration diagnostics for conditional density fits; propose PIT/coverage diagnostics, posterior predictive checks in Bayes space, or scoring rules for conditional densities.
- Survey design and weights (relevant for SOEP) are not incorporated; develop weighted estimation and inference to account for complex survey designs and nonresponse.
- Measurement error and heaping in income are not modeled; study robustness of density effects to outcome mismeasurement and propose correction strategies.
- Interpretation and policy relevance
- Translating density-ratio findings into interpretable distributional summaries (e.g., effects on tail probabilities, mass at zero, inequality indices) is not formalized; provide standard functionals and their inference derived from estimated densities.
- The decomposition conflates participation and wage-setting mechanisms when zeros reflect non-employment; develop layered decompositions (e.g., participation vs. wage conditional on employment) within the density framework.
- Application-specific issues (German East–West income gap)
- Unconfoundedness for “origin-based” treatment is plausible but not justified empirically; conduct balancing checks, placebo tests, or instrumental strategies to bolster causal claims.
- Potential treatment–covariate feedback (e.g., education or occupation affected by origin) is unaddressed; re-estimate excluding likely post-treatment covariates or apply mediation-aware decompositions.
- Overlap between East- and West-origin covariate distributions is not examined; report overlap diagnostics and, if needed, perform overlap trimming with corresponding interpretation adjustments.
These gaps highlight theory, estimation, inference, and application aspects that, if addressed, would strengthen the causal interpretation, robustness, and practical utility of density-based counterfactual decompositions.
Collections
Sign up for free to add this paper to one or more collections.