Propensity Score Estimates
- Propensity score estimates are conditional probabilities of treatment assignment given covariates, reducing bias in observational studies.
- They are estimated using methods such as logistic regression, machine learning, and calibration techniques to ensure covariate balance.
- Applications include inverse probability weighting, matching, and stratification to approximate randomization and improve causal estimates.
A propensity score is the conditional probability of receiving a treatment, given observed covariates, and is foundational for reducing confounding bias in observational studies. Formally, for a binary treatment assignment and covariate vector , the propensity score is . Propensity score estimates enable the construction of quasi-experimental comparisons through inverse probability weighting (IPW), matching, or subclassification, each leveraging the score to approximate randomization. Estimation approaches are now highly diversified, including parametric, semiparametric, nonparametric, regularized, balancing, calibration, and machine learning methods. The selection of estimation strategy has substantial impact on bias, variance, robustness to model misspecification, finite-sample inference, and domain of causal generalization.
1. Core Frameworks for Propensity Score Estimation
Standard estimation models begin with parametric forms, predominantly logistic regression: $\logit(e(x)) = \beta_0 + \sum_{i=1}^{p} \beta_i x_i$ where is estimated by maximum likelihood on sample data. More complex or high-dimensional scenarios favor penalized or flexible models, for example, -regularized estimators for variable selection or bias-variance control (Ning et al., 2018, Tan, 2017). Model misspecification, however, yields bias in subsequent causal effect estimates, motivating many balancing or calibration frameworks.
Covariate balancing propensity score (CBPS) methods fit such that covariate means are balanced in the weighted sample: where is a basis of covariates and their transformations (Ning et al., 2018, Orihara, 2022). Alternative estimators target integrated distributional balance across the full covariate distribution (the Integrated Propensity Score, IPS) (Sant'Anna et al., 2018).
Machine learning and nonparametric approaches—such as boosted trees, deep nets, monotonic isotonic regression, or kernel-based estimators—relax model restrictions yet introduce challenges in calibration and valid weighting (Peng et al., 7 Apr 2024, Xu et al., 2022, Liu et al., 2022).
2. Propensity Score-Based Estimators for Causal Effects
Propensity score estimates are central in constructing three principal classes of causal estimators:
- Inverse Probability Weighting (IPW): IPW reweights outcomes by estimated treatment probabilities; stabilized or truncated weights are used to control variance when scores approach 0 or 1 (Poletto et al., 30 Aug 2024).
- Propensity Score Matching (PSM):
Pairs treated and control units by nearest neighbor in estimated propensity score, often with a caliper to restrict matches to within a maximum distance. 1:1 nearest neighbor matching without replacement and with empirically chosen caliper width is typical; resulting treatment effect is estimated on the matched cohort (Poletto et al., 30 Aug 2024, Liu et al., 2022).
- Stratification/Subclassification:
Subjects are partitioned into strata based on quantiles or grid values of estimated propensity score; treatment effects are estimated within each stratum and combined as a weighted average (Poletto et al., 30 Aug 2024, Orihara et al., 19 Oct 2024).
3. Calibration, Balance, and Efficiency
Well-calibrated propensity scores (i.e., predicted probabilities equal empirical treatment assignment rates) are a necessary condition for unbiased treatment effect estimation with IPW and augmented-IPW (AIPW) estimators (Gutman et al., 2022, Deshpande et al., 2023). Empirical decomposition reveals that lack of calibration induces bias even if covariate balance is achieved (Deshpande et al., 2023). Calibration is particularly challenging for models estimated with expressive learners (random forests, neural nets); post-calibration with Platt scaling or isotonic regression dramatically reduces bias (Gutman et al., 2022, Deshpande et al., 2023).
Covariate balance—whether enforced globally or locally (i.e., within small neighborhoods of the score)—is another axis of estimator quality. Local balancing is essential for reducing bias in heterogeneously distributed covariates or under model misspecification (Peng et al., 7 Apr 2024).
Semiparametric efficiency bounds are reached when estimation procedures ensure all relevant covariate structure is captured by the balancing score without excessive model-based restriction (Liu et al., 2022, Sant'Anna et al., 2018).
4. Subgroup and Heterogeneity-Oriented Extensions
When interest is in subgroup average treatment effects (SATEs or ATT by subgroup), estimates based solely on overall propensity score models can yield substantial within-subgroup imbalance and bias. Multiple frameworks address this:
- Subgroup Balancing Propensity Score (SBPS) adaptively selects, for each subgroup, whether the PS for units in that group is fit overall or within-subgroup, optimizing subgroup and global covariate balance simultaneously through a stochastic search/minimization (Dong et al., 2017). This enables accurate subgroup CAE (e.g., ATT) estimation, outperforming standard approaches in both bias and CI coverage.
- Guaranteed Subgroup-Balanced Propensity Score methods (G-SBPS / kG-SBPS) enforce zero mean differences on both global and subgroup-specific (and optionally kernel-featured) covariates, producing exact balance within each target group (Li et al., 17 Apr 2024). This enhances robustness to misspecification, particularly for complex subpopulations or nonlinear confounder structures.
Bayesian approaches, such as the Bayesian-based subclassification estimator, integrate over uncertainty in the number of strata for subclassification, thereby reflecting design-stage uncertainty in the inferential phase (Orihara et al., 19 Oct 2024).
5. Robustness, Regularization, and High Dimensionality
High-dimensional confounding, model misspecification, and complex sampling structures motivate robust and regularized estimation:
- Regularized Calibrated Estimation blends calibration and -penalization, minimizing a loss constructed to bound mean squared relative errors of inverse weights—a key determinant of bias in outcome means (Tan, 2017). This regime achieves improved mean-balance, higher-order stability, and interpretable high-dimensional convergence rates.
- Double Robustness and High-Dimensional Covariate Balancing: The HD-CBPS procedure achieves -consistency and asymptotic normality—provided either the PS or outcome model is correct—by combining penalized estimation with a secondary calibration step targeting prognostic covariates (Ning et al., 2018). Empirically, this sequence outperforms comparable regularized AIPW or approximate balancing methods in RMSE and coverage.
- Direct Bias-Correction Term Estimation: Recent approaches for directly estimating the bias-correction term rather than itself enable precise targeting of the mean-squared error in treatment effect estimators, providing improved finite-sample MSE even in complex or nonlinear settings (Kato, 26 Sep 2025).
6. Nonparametric and Tuning-Free Estimation Strategies
Nonparametric and shape-constrained methods such as isotonic regression or monotone MLEs provide tuning-parameter-free alternatives suitable wherever monotonicity of the PS is plausible. These procedures yield piecewise-constant estimated scores, with matching conducted over automatically defined blocks ("block-matching"). Under regularity and monotonicity, block-matching or one-to-many isotonic approaches attain the semiparametric efficiency bound for treatment effect estimation (Xu et al., 2022, Liu et al., 2022). They are robust to link function misspecification, obviate the need for manual tuning (e.g., caliper width, nearest neighbor count), and in univariate or index models, require only the pool-adjacent-violators algorithm for efficient computation.
Kernel and deep learning-based PS estimation further relax parametric requirements, with explicit regularization and balance-promoting loss functions necessary to guarantee covariate distributional alignment and calibration (Peng et al., 7 Apr 2024). These achieved state-of-the-art finite-sample balance and IPW stability even under strong misspecification and high-dimensionality.
7. Special Sampling Structures and Measurement Error
Sampling schemes such as length-biased sampling or oversampling of rare exposures render naïve PS estimation inconsistent. Weighted estimating equations or sample prevalence reweighting yield consistent estimators when population prevalence or censoring distribution is known or estimable, and are agnostic to the base algorithm used (Rose, 2018, Ertefaie et al., 2013).
Covariate measurement error introduces bias in both the propensity score and the estimated treatment effect, with the magnitude modulated by the correlation among true confounders and among measurement errors. Auxiliary variables measured without error and correlated with the confounders can attenuate this bias. Sensitivity analysis for reliability and measurement error correlation is recommended (1706.02283).
References
- (Poletto et al., 30 Aug 2024) "Comparing Propensity Score-Based Methods in Estimating the Treatment Effects: A Simulation Study"
- (Dong et al., 2017) "Subgroup Balancing Propensity Score"
- (Rose, 2018) "Consistent Estimation of Propensity Score Functions with Oversampled Exposed Subjects"
- (Ertefaie et al., 2013) "The Propensity Score Estimation in the Presence of Length-biased Sampling: A Nonparametric Adjustment Approach"
- (Ning et al., 2018) "Robust Estimation of Causal Effects via High-Dimensional Covariate Balancing Propensity Score"
- (Gutman et al., 2022) "Propensity score models are better when post-calibrated"
- (Deshpande et al., 2023) "Calibrated and Conformal Propensity Scores for Causal Effect Estimation"
- (Liu et al., 2022) "Tuning-parameter-free optimal propensity score matching approach for causal inference"
- (Li et al., 17 Apr 2024) "Propensity Score Analysis with Guaranteed Subgroup Balance"
- (Kato, 26 Sep 2025) "Direct Bias-Correction Term Estimation for Propensity Scores and Average Treatment Effect Estimation"
- (Tan, 2017) "Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data"
- (Peng et al., 7 Apr 2024) "A Deep Learning Approach to Nonparametric Propensity Score Estimation with Optimized Covariate Balance"
- (Xu et al., 2022) "Isotonic propensity score matching"
- (1706.02283) "Propensity score-based estimators with multiple error-prone covariates"
- (Sant'Anna et al., 2018) "Covariate Distribution Balance via Propensity Scores"
- (Orihara, 2022) "Robust Estimating Method for Propensity Score Models and its Application to Some Causal Estimands: A review and proposal"
- (Orihara et al., 19 Oct 2024) "Bayesian-based Propensity Score Subclassification Estimator"
- (Su et al., 2023) "When is the estimated propensity score better? High-dimensional analysis and bias correction"
Summary Table: Core Estimation Approaches and Features
| Method | Model & Balance Principle | Main Features/Advantages |
|---|---|---|
| Logistic/Parametric MLE | Model-based (GLM) | Tractable, interpretable, but sensitive to misspecification |
| CBPS | Moment balancing (GMM) | Directly targets covariate means/balance; robustifies against PS misspecification; flexible basis selection |
| Integrated Propensity Score (IPS) | Integral distributional | Balances entire covariate distribution; tuning-free criterion; globally efficient |
| Isotonic/Monotone Nonparametric | Shape-restricted regression | Tuning-parameter-free; automatic match/block size; achieves semiparametric efficiency in monotone settings |
| HD-CBPS (Penalized/Calibrated) | Sparsity + calibration | Controls high-dimensional bias/variance; double-robustness; consistent under one correct model (PS or outcome) |
| Deep/Kernel/ML Estimators | Nonparametric ML | Highly flexible functional form; requires calibration/balance regularization for valid inference |
| Subgroup/Guaranteed Balance (SBPS, G-SBPS, kG-SBPS) | Subgroup + covariate balance | Adaptive for subgroup SATE/ATT; integrates subgroup-specific PS construction; improves subgroup inference |
| Direct Bias-Correction | Focused on IPW estimator MSE | Direct minimization of bias-correction term error; improves finite-sample MSE for ATE |
| Weighted/Adjusted for Sampling/Errors | Design-adapted loss/weight | Guarantees consistency under oversampling or length-biased sampling; robustifies with measurement error if reliability/correlation are characterized |
This array of estimation and design approaches for propensity scores provides rigorous, flexible, and increasingly robust tools for unbiased and efficient causal inference in contemporary observational studies.