Papers
Topics
Authors
Recent
2000 character limit reached

Redefined Propensity Score

Updated 31 December 2025
  • Redefined Propensity Score is a generalized framework that formalizes treatment assignment probabilities to achieve enhanced covariate balance and semiparametric efficiency.
  • It employs empirical score equations, calibration techniques, and convex optimization to overcome limitations of classical propensity scores, ensuring robustness in high-dimensional or misspecified contexts.
  • This approach bridges traditional causal inference with modern machine learning by targeting both balance and efficiency through rigorous testing and tailored loss functions.

A redefined propensity score formalizes, generalizes, or operationalizes the notion of treatment assignment probability to achieve properties beyond mere estimation of group assignment probabilities—such as explicit covariate balance, calibrated assignment probabilities, semiparametric efficiency, or robustness in high-dimensional or misspecified settings. Recent research has focused on unifying and expanding definitions of the propensity score using score equations, minimum-distance balancing, tailored loss functions, calibration, and information-theoretic projections. Several approaches operationalize these redefinitions through empirical moment equations, scoring rules, or convex optimization, directly targeting balance and efficiency in causal inference from observational data.

1. The Classical Propensity Score and Its Balancing Property

The propensity score, introduced by Rosenbaum and Rubin (1983), is defined as the conditional probability of receiving treatment given observed covariates:

e0(X)=P0(Z=1X)e_0(X) = P_0(Z=1|X)

where Z{0,1}Z\in\{0,1\} is the treatment assignment and XX is the vector of pre-treatment covariates. The central property is that e0(X)e_0(X) is a balancing score: given e0(X)e_0(X), the distribution of XX is independent of ZZ. No non-trivial function of XX coarser than e0(X)e_0(X) achieves this property. This balancing score property is the basis for adjustment strategies such as matching, inverse weighting, and stratification (Hejazi et al., 2022).

2. Redefinition via Empirical Score Equations

Hejazi and van der Laan propose redefining the propensity score as any function that solves a system of empirical score equations based on a set F\mathcal{F} of (possibly infinite) functions:

Pn[f(X){Zen(X)}]=0,fFP_n[f(X) \{Z - e_n(X)\}] = 0, \quad \forall f\in\mathcal{F}

where PnP_n denotes the empirical mean. When F\mathcal{F} is the span of a chosen set of basis functions, this approach subsumes finite covariate balancing. Including the efficient-influence-function (EIF) weight h(X)=Qn(1,X)/en(X)h(X)=\overline Q_n(1,X)/e_n(X) brings the estimator in line with semiparametric efficiency theory, and the solution coincides with the score equation for efficient IPW or AIPW estimation (Hejazi et al., 2022).

The following table illustrates these mappings:

Construction Requirement on ene_n Main Purpose
Classical PS e0(X)=P0(Z=1X)e_0(X) = P_0(Z=1|X) Covariate balancing
Score-equation PS Pn[f(X){Zen(X)}]=0fFP_n[f(X)\{Z-e_n(X)\}] = 0\,\forall f\in\mathcal{F} Generalized balance, efficiency

This approach generalizes the balancing requirement to arbitrary function classes, fostering machine-learning-driven or semiparametric estimation pipelines that jointly enforce balance and optimize estimator efficiency.

3. Bridging Covariate Balance and Efficiency

Within modern semiparametric theory, efficiency of IPW or doubly robust estimators is attained only if the nuisance estimators—especially the propensity function—obey the score equations linked to the efficient influence function (EIF). The EIF for the counterfactual mean under treatment, τ0=E0[R(1)]\tau_0 = E_0[R(1)], admits the form:

D(P0)(O)=Ze0(X){RQ0(Z,X)}+Q0(1,X)τ0D^\star(P_0)(O) = \frac{Z}{e_0(X)}\{ R - \overline Q_0(Z,X) \} + \overline Q_0(1,X) - \tau_0

Semiparametric efficiency is achieved if

Pn[hn(X)(Zen(X))]=0,hn(X)=Qn(1,X)/en(X)P_n \big[ h_n(X) (Z - e_n(X)) \big] = 0,\quad h_n(X) = \overline Q_n(1,X)/e_n(X)

is solved, in addition to constructing the IPW term explicitly. Thus, the “redefined” propensity score aligns the balancing property of e0e_0 with the efficiency property required for optimal ATE estimation.

This duality indicates that efficient estimators in modern causal inference should not merely "fit" ene_n for predictive accuracy but should ensure that it solves a rich set of balancing and EIF-inducing empirical equations (Hejazi et al., 2022).

4. Calibration, Covariate Balancing, and Regularization

Calibration in the propensity score context demands that, for any predicted probability pp,

P(T=1Q(X)=p)=pP(T=1\,|\,Q(X)=p)=p

This is necessary for unbiasedness of both IPW and AIPW estimators, and can be achieved through post-hoc recalibration (e.g., isotonic regression, Platt scaling), directly tightening IPTW estimation error bounds and preventing extreme weights. Conformal extensions yield finite-sample coverage guarantees (Deshpande et al., 2023).

Covariate Balancing Scoring Rules (CBSR) use strictly proper scoring rules tailored to the estimand and link function, yielding loss functions whose minimization automatically enforces (approximate) covariate balancing. For example, maximizing the CBSR implied by the appropriate link and estimand ensures that the resulting inverse-weighted sample means of covariate functions match across groups (Zhao, 2016).

Regularized calibrated estimation further extends this perspective. In high-dimensional settings or with model misspecification, minimizing a calibration loss plus a Lasso penalty

γ^RCAL=argminγ{CAL(γ)+λγ1:p1}\hat\gamma_{\rm RCAL} = \arg\min_{\gamma} \{ \ell_{\rm CAL}(\gamma) + \lambda \|\gamma_{1:p}\|_1 \}

controls both relative error and likelihood risk, yielding sparser, stable, and well-calibrated propensity estimates that guarantee improved weight properties over maximum likelihood or regularized-ML (Tan, 2017).

5. Information Projection, Full Distributional Matching, and Alternative Redefinitions

Alternative redefinitions position the propensity score as the solution to moment-constrained information projections or infinite-dimensional minimum-distance balancing:

  • Information projection (I-projection): Estimates the inverse propensity via an exponential-tilt density ratio that satisfies covariate balancing moment constraints over a chosen set of functions. This approach can be penalized for variable selection and generalized to multivariate-missing-data settings. The resulting weights automatically satisfy self-efficiency, achieving semiparametric efficiency under correct specification (Wang et al., 2021).
  • Integrated Propensity Score (IPS): Solves an infinite family of balancing equations to enforce entire joint distributional balance between treatment groups, instead of just matching finite moments. The IPS estimator operationalizes this as minimization over a sample criterion involving rich weighting functions, thus targeting global covariate balance for more stable and robust inverse weighting (Sant'Anna et al., 2018).
  • Outcome-adaptive propensity scores: Augment the covariate set with a summary of the predicted probability of outcome ("OP"), allowing the propensity model to leverage outcome–covariate relationships. This enhances statistical efficiency, controls variance, and improves robustness without increased dimensionality, especially in high-dimensional or complex-design settings (Yu et al., 2022).

6. Score-based Formal Testing of Balance

Redefining balance in terms of empirical score equations permits formal statistical tests for covariate balance post-fitting. For any f(X)f(X), a score test

Tn(f)=Pn[f(X){Zen(X)}]Pn[f(X)2en(X){1en(X)}]/nT_n(f) = \frac{P_n[f(X)\{Z-e_n(X)\}]}{ \sqrt{\,P_n[f(X)^2 e_n(X)\{1-e_n(X)\}] / n } }

is asymptotically standard normal under the null that no residual imbalance remains in that direction. Testing across a family of fFf\in\mathcal{F} inspects whether the chosen balancing function class sufficiently adjusts for confounding. The tests are strictly tied to the score equation solutions: when the fitting procedure enforces exact balance in a subspace, the corresponding Tn(f)T_n(f) will not reject in large samples (Hejazi et al., 2022).

7. Implementation Strategies and Practical Implications

Redefined propensity score estimators motivate several practical implementation strategies:

  • Highly adaptive lasso (HAL), sieve-MLE, and targeted one-dimensional updates can be used to fit ene_n over a function space F\mathcal{F} rich enough to include the EIF weight for semiparametric efficiency.
  • Use of proper scoring rule–based losses (CBSR) and kernel methods to balance complex nonlinear functions.
  • Regularization (Lasso, elastic-net) for high-dimensional covariate sets.
  • Outcome-adaptive covariate augmentation to recover efficiency in machine-learning-driven propensity estimation.
  • Direct minimization of moment-constrained information divergence or minimum-distance balancing functionals.

These methods unify the traditional roles of the propensity score as a balancing device and an efficiency–enabling nuisance function for doubly robust inference (Hejazi et al., 2022, Tan, 2017, Zhao, 2016, Deshpande et al., 2023, Sant'Anna et al., 2018, Wang et al., 2021, Yu et al., 2022).

References

  • "Revisiting the propensity score's central role: Towards bridging balance and efficiency in the era of causal machine learning" (Hejazi et al., 2022)
  • "Information projection approach to propensity score estimation for handling selection bias under missing at random" (Wang et al., 2021)
  • "Calibrated and Conformal Propensity Scores for Causal Effect Estimation" (Deshpande et al., 2023)
  • "Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data" (Tan, 2017)
  • "Covariate Balancing Propensity Score by Tailored Loss Functions" (Zhao, 2016)
  • "Covariate Distribution Balance via Propensity Scores" (Sant'Anna et al., 2018)
  • "Outcome Adaptive Propensity Score Methods for Handling Censoring and High-Dimensionality: Application to Insurance Claims" (Yu et al., 2022)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Redefined Propensity Score.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube