Papers
Topics
Authors
Recent
2000 character limit reached

Propensity Score Estimates

Updated 11 December 2025
  • Propensity score estimates are conditional probabilities of treatment assignment given covariates, reducing bias in observational studies.
  • They are estimated using methods such as logistic regression, machine learning, and calibration techniques to ensure covariate balance.
  • Applications include inverse probability weighting, matching, and stratification to approximate randomization and improve causal estimates.

A propensity score is the conditional probability of receiving a treatment, given observed covariates, and is foundational for reducing confounding bias in observational studies. Formally, for a binary treatment assignment AA and covariate vector XX, the propensity score is e(x)=P(A=1∣X=x)e(x) = P(A=1 \mid X=x). Propensity score estimates enable the construction of quasi-experimental comparisons through inverse probability weighting (IPW), matching, or subclassification, each leveraging the score to approximate randomization. Estimation approaches are now highly diversified, including parametric, semiparametric, nonparametric, regularized, balancing, calibration, and machine learning methods. The selection of estimation strategy has substantial impact on bias, variance, robustness to model misspecification, finite-sample inference, and domain of causal generalization.

1. Core Frameworks for Propensity Score Estimation

Standard estimation models begin with parametric forms, predominantly logistic regression: $\logit(e(x)) = \beta_0 + \sum_{i=1}^{p} \beta_i x_i$ where β\beta is estimated by maximum likelihood on sample data. More complex or high-dimensional scenarios favor penalized or flexible models, for example, ℓ1\ell_1-regularized estimators for variable selection or bias-variance control (Ning et al., 2018, Tan, 2017). Model misspecification, however, yields bias in subsequent causal effect estimates, motivating many balancing or calibration frameworks.

Covariate balancing propensity score (CBPS) methods fit e(x)e(x) such that covariate means are balanced in the weighted sample: 1n∑i=1n[Ai−e(Xi;β)]f(Xi)=0\frac{1}{n} \sum_{i=1}^n [A_i - e(X_i; \beta)] f(X_i) = 0 where f(X)f(X) is a basis of covariates and their transformations (Ning et al., 2018, Orihara, 2022). Alternative estimators target integrated distributional balance across the full covariate distribution (the Integrated Propensity Score, IPS) (Sant'Anna et al., 2018).

Machine learning and nonparametric approaches—such as boosted trees, deep nets, monotonic isotonic regression, or kernel-based estimators—relax model restrictions yet introduce challenges in calibration and valid weighting (Peng et al., 7 Apr 2024, Xu et al., 2022, Liu et al., 2022).

2. Propensity Score-Based Estimators for Causal Effects

Propensity score estimates are central in constructing three principal classes of causal estimators:

  • Inverse Probability Weighting (IPW): Ï„^IPW=1n∑i=1n[AiYie^(Xi)−(1−Ai)Yi1−e^(Xi)]\hat{\tau}_{\mathrm{IPW}} = \frac{1}{n} \sum_{i=1}^n \left[\frac{A_i Y_i}{\hat e(X_i)} - \frac{(1-A_i) Y_i}{1-\hat e(X_i)}\right] IPW reweights outcomes by estimated treatment probabilities; stabilized or truncated weights are used to control variance when scores approach 0 or 1 (Poletto et al., 30 Aug 2024).
  • Propensity Score Matching (PSM):

Pairs treated and control units by nearest neighbor in estimated propensity score, often with a caliper to restrict matches to within a maximum distance. 1:1 nearest neighbor matching without replacement and with empirically chosen caliper width is typical; resulting treatment effect is estimated on the matched cohort (Poletto et al., 30 Aug 2024, Liu et al., 2022).

  • Stratification/Subclassification:

Subjects are partitioned into strata based on quantiles or grid values of estimated propensity score; treatment effects are estimated within each stratum and combined as a weighted average (Poletto et al., 30 Aug 2024, Orihara et al., 19 Oct 2024).

3. Calibration, Balance, and Efficiency

Well-calibrated propensity scores (i.e., predicted probabilities equal empirical treatment assignment rates) are a necessary condition for unbiased treatment effect estimation with IPW and augmented-IPW (AIPW) estimators (Gutman et al., 2022, Deshpande et al., 2023). Empirical decomposition reveals that lack of calibration induces bias even if covariate balance is achieved (Deshpande et al., 2023). Calibration is particularly challenging for models estimated with expressive learners (random forests, neural nets); post-calibration with Platt scaling or isotonic regression dramatically reduces bias (Gutman et al., 2022, Deshpande et al., 2023).

Covariate balance—whether enforced globally or locally (i.e., within small neighborhoods of the score)—is another axis of estimator quality. Local balancing is essential for reducing bias in heterogeneously distributed covariates or under model misspecification (Peng et al., 7 Apr 2024).

Semiparametric efficiency bounds are reached when estimation procedures ensure all relevant covariate structure is captured by the balancing score without excessive model-based restriction (Liu et al., 2022, Sant'Anna et al., 2018).

4. Subgroup and Heterogeneity-Oriented Extensions

When interest is in subgroup average treatment effects (SATEs or ATT by subgroup), estimates based solely on overall propensity score models can yield substantial within-subgroup imbalance and bias. Multiple frameworks address this:

  • Subgroup Balancing Propensity Score (SBPS) adaptively selects, for each subgroup, whether the PS for units in that group is fit overall or within-subgroup, optimizing subgroup and global covariate balance simultaneously through a stochastic search/minimization (Dong et al., 2017). This enables accurate subgroup CAE (e.g., ATT) estimation, outperforming standard approaches in both bias and CI coverage.
  • Guaranteed Subgroup-Balanced Propensity Score methods (G-SBPS / kG-SBPS) enforce zero mean differences on both global and subgroup-specific (and optionally kernel-featured) covariates, producing exact balance within each target group (Li et al., 17 Apr 2024). This enhances robustness to misspecification, particularly for complex subpopulations or nonlinear confounder structures.

Bayesian approaches, such as the Bayesian-based subclassification estimator, integrate over uncertainty in the number of strata for subclassification, thereby reflecting design-stage uncertainty in the inferential phase (Orihara et al., 19 Oct 2024).

5. Robustness, Regularization, and High Dimensionality

High-dimensional confounding, model misspecification, and complex sampling structures motivate robust and regularized estimation:

  • Regularized Calibrated Estimation blends calibration and â„“1\ell_1-penalization, minimizing a loss constructed to bound mean squared relative errors of inverse weights—a key determinant of bias in outcome means (Tan, 2017). This regime achieves improved mean-balance, higher-order stability, and interpretable high-dimensional convergence rates.
  • Double Robustness and High-Dimensional Covariate Balancing: The HD-CBPS procedure achieves n\sqrt{n}-consistency and asymptotic normality—provided either the PS or outcome model is correct—by combining penalized estimation with a secondary calibration step targeting prognostic covariates (Ning et al., 2018). Empirically, this sequence outperforms comparable regularized AIPW or approximate balancing methods in RMSE and coverage.
  • Direct Bias-Correction Term Estimation: Recent approaches for directly estimating the bias-correction term h0(X,D)=1{D=1}e0(X)−1{D=0}1−e0(X)h_0(X, D) = \frac{1\{D=1\}}{e_0(X)} - \frac{1\{D=0\}}{1-e_0(X)} rather than e0(X)e_0(X) itself enable precise targeting of the mean-squared error in treatment effect estimators, providing improved finite-sample MSE even in complex or nonlinear settings (Kato, 26 Sep 2025).

6. Nonparametric and Tuning-Free Estimation Strategies

Nonparametric and shape-constrained methods such as isotonic regression or monotone MLEs provide tuning-parameter-free alternatives suitable wherever monotonicity of the PS is plausible. These procedures yield piecewise-constant estimated scores, with matching conducted over automatically defined blocks ("block-matching"). Under regularity and monotonicity, block-matching or one-to-many isotonic approaches attain the semiparametric efficiency bound for treatment effect estimation (Xu et al., 2022, Liu et al., 2022). They are robust to link function misspecification, obviate the need for manual tuning (e.g., caliper width, nearest neighbor count), and in univariate or index models, require only the pool-adjacent-violators algorithm for efficient computation.

Kernel and deep learning-based PS estimation further relax parametric requirements, with explicit regularization and balance-promoting loss functions necessary to guarantee covariate distributional alignment and calibration (Peng et al., 7 Apr 2024). These achieved state-of-the-art finite-sample balance and IPW stability even under strong misspecification and high-dimensionality.

7. Special Sampling Structures and Measurement Error

Sampling schemes such as length-biased sampling or oversampling of rare exposures render naïve PS estimation inconsistent. Weighted estimating equations or sample prevalence reweighting yield consistent estimators when population prevalence or censoring distribution is known or estimable, and are agnostic to the base algorithm used (Rose, 2018, Ertefaie et al., 2013).

Covariate measurement error introduces bias in both the propensity score and the estimated treatment effect, with the magnitude modulated by the correlation among true confounders and among measurement errors. Auxiliary variables measured without error and correlated with the confounders can attenuate this bias. Sensitivity analysis for reliability and measurement error correlation is recommended (1706.02283).


References

  • (Poletto et al., 30 Aug 2024) "Comparing Propensity Score-Based Methods in Estimating the Treatment Effects: A Simulation Study"
  • (Dong et al., 2017) "Subgroup Balancing Propensity Score"
  • (Rose, 2018) "Consistent Estimation of Propensity Score Functions with Oversampled Exposed Subjects"
  • (Ertefaie et al., 2013) "The Propensity Score Estimation in the Presence of Length-biased Sampling: A Nonparametric Adjustment Approach"
  • (Ning et al., 2018) "Robust Estimation of Causal Effects via High-Dimensional Covariate Balancing Propensity Score"
  • (Gutman et al., 2022) "Propensity score models are better when post-calibrated"
  • (Deshpande et al., 2023) "Calibrated and Conformal Propensity Scores for Causal Effect Estimation"
  • (Liu et al., 2022) "Tuning-parameter-free optimal propensity score matching approach for causal inference"
  • (Li et al., 17 Apr 2024) "Propensity Score Analysis with Guaranteed Subgroup Balance"
  • (Kato, 26 Sep 2025) "Direct Bias-Correction Term Estimation for Propensity Scores and Average Treatment Effect Estimation"
  • (Tan, 2017) "Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data"
  • (Peng et al., 7 Apr 2024) "A Deep Learning Approach to Nonparametric Propensity Score Estimation with Optimized Covariate Balance"
  • (Xu et al., 2022) "Isotonic propensity score matching"
  • (1706.02283) "Propensity score-based estimators with multiple error-prone covariates"
  • (Sant'Anna et al., 2018) "Covariate Distribution Balance via Propensity Scores"
  • (Orihara, 2022) "Robust Estimating Method for Propensity Score Models and its Application to Some Causal Estimands: A review and proposal"
  • (Orihara et al., 19 Oct 2024) "Bayesian-based Propensity Score Subclassification Estimator"
  • (Su et al., 2023) "When is the estimated propensity score better? High-dimensional analysis and bias correction"

Summary Table: Core Estimation Approaches and Features

Method Model & Balance Principle Main Features/Advantages
Logistic/Parametric MLE Model-based (GLM) Tractable, interpretable, but sensitive to misspecification
CBPS Moment balancing (GMM) Directly targets covariate means/balance; robustifies against PS misspecification; flexible basis selection
Integrated Propensity Score (IPS) Integral distributional Balances entire covariate distribution; tuning-free criterion; globally efficient
Isotonic/Monotone Nonparametric Shape-restricted regression Tuning-parameter-free; automatic match/block size; achieves semiparametric efficiency in monotone settings
HD-CBPS (Penalized/Calibrated) Sparsity + calibration Controls high-dimensional bias/variance; double-robustness; consistent under one correct model (PS or outcome)
Deep/Kernel/ML Estimators Nonparametric ML Highly flexible functional form; requires calibration/balance regularization for valid inference
Subgroup/Guaranteed Balance (SBPS, G-SBPS, kG-SBPS) Subgroup + covariate balance Adaptive for subgroup SATE/ATT; integrates subgroup-specific PS construction; improves subgroup inference
Direct Bias-Correction Focused on IPW estimator MSE Direct minimization of bias-correction term error; improves finite-sample MSE for ATE
Weighted/Adjusted for Sampling/Errors Design-adapted loss/weight Guarantees consistency under oversampling or length-biased sampling; robustifies with measurement error if reliability/correlation are characterized

This array of estimation and design approaches for propensity scores provides rigorous, flexible, and increasingly robust tools for unbiased and efficient causal inference in contemporary observational studies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Propensity Score Estimates.