Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 77 tok/s
Gemini 2.5 Pro 56 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 21 tok/s Pro
GPT-4o 107 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 436 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Proximal Causal Learning (PCL)

Updated 30 June 2025
  • Proximal Causal Learning (PCL) is a framework that uses measured proxies to address unmeasured confounding in observational causal inference.
  • It leverages bridge functions and the proximal g-formula to generalize traditional methods for more reliable causal effect estimates.
  • PCL is applicable to both point treatment and time-varying settings, offering robust identification when standard assumptions fail.

Proximal Causal Learning (PCL) is a formal framework for causal inference in observational data that systematically addresses settings plagued by unmeasured confounding, where measured covariates serve only as proxies—imperfect measurements—of the true confounding mechanisms. This approach extends beyond the conventional exchangeability assumption, providing nonparametric identification and estimation procedures for causal effects even when classic "no unmeasured confounding" conditions fail. PCL generalizes foundational causal methods such as the g-formula and g-computation, enabling practitioners to draw credible causal inferences using appropriately classified and sufficiently informative proxies.

1. Conceptual Foundations and Framework

PCL is motivated by the observation that, in practice, investigators seldom measure all confounders perfectly. Instead, covariate measurements are often noisy, capturing only partial information about the true, unobserved confounders. In the potential outcome framework, let AA denote treatment, YY outcome, LL measured covariates, and UU the unmeasured confounders. The core estimand is the mean potential outcome under treatment aa, denoted β(a)=E[Ya]\beta(a) = \mathbb{E}[Y_a].

Traditional approaches assume exchangeability: YaAL,Y_a \perp A \mid L, but PCL recognizes that this is rarely satisfied. Therefore, it partitions measured covariates into three groups:

  • XX: common causes of AA and YY (possibly well-measured confounders);
  • ZZ: treatment-inducing proxies correlated with unmeasured confounders and AA, but not direct causes of YY;
  • WW: outcome-inducing proxies correlated with unmeasured confounders and YY, but not direct causes of AA.

A typical linear model illustrating this framework is: E[YA,Z,X,U]=β0+βaA+βuU+βxX,E[WA,Z,X,U]=η0+ηuU+ηxX,\mathbb{E}[Y \mid A, Z, X, U] = \beta_0 + \beta_a A + \beta_u U + \beta_x' X, \quad \mathbb{E}[W \mid A, Z, X, U] = \eta_0 + \eta_u U + \eta_x' X, where WW proxies the latent UU.

2. Challenges, Assumptions, and Bridge Function Approach

Main challenges in causal learning from proxies:

  • Identification of causal effects is an ill-posed inverse problem due to unmeasured confounding.
  • Reliance on proxies introduces dependence on how well they “cover” the hidden confounder.

PCL addresses these by introducing:

  • A rigorous classification of measured variables into confounders and proxies.
  • Conditional independence assumptions, including:
    • YZA,U,XY \perp Z \mid A, U, X (treatment-proxy independency),
    • W(A,Z)U,XW \perp (A, Z) \mid U, X (outcome-proxy independency).

Completeness is crucial: proxies must be "rich enough" to allow for the inversion of the mapping from proxies to confounders. Categorially, this requires min(dz,dw)du\min(d_z, d_w) \geq d_u, where dz,dw,dud_z, d_w, d_u are the numbers of categories in Z,W,UZ, W, U, respectively.

The core of identification is the bridge function:

E[Ya,z,x]=wh(a,x,w)f(wa,x,z),\mathbb{E}[Y \mid a, z, x] = \sum_w h(a, x, w) f(w \mid a, x, z),

which is a Fredholm integral equation of the first kind, solved for hh.

3. Proximal Identification, g-Formula, and Algorithms

Given the above structure, the proximal g-formula identifies the mean potential outcome as

β(a)=w,xh(a,x,w)f(w,x),\beta(a) = \sum_{w, x} h(a, x, w) f(w, x),

where hh is the solution to the bridge equation. In the classical setting, this reduces to the standard g-formula.

Algorithm for estimation:

  1. Specify and fit models for the proxy distribution ff and the bridge function hh.
  2. Use penalized regression (e.g., least squares, maximum likelihood) to fit hh using observed (A,X,Z,W)(A, X, Z, W).
  3. Estimate β(a)\beta(a) by averaging fitted bridge function predictions over the observed proxies.

Special case—Proximal 2SLS: If all models are linear:

  • Stage 1: Predict WW from (Z,A,X)(Z, A, X).
  • Stage 2: Regress YY on (A,X,W^)(A, X, \widehat{W}).

This structure generalizes the familiar instrumental variable estimators to the case of proxies for confounding.

4. Sufficient Conditions, Robustness, and Generalizations

Sufficient conditions for identification include:

  • Conditional independence of proxies given the unobserved confounder and relevant variables.
  • Completeness for the proxies with respect to UU.
  • Existence of a solution to the bridge equation.

When these are satisfied, and the proxies are informative, PCL provides nonparametric identification even when standard adjustment fails.

For time-varying treatments:

  • The approach generalizes recursively, with sequential bridge functions that allow for the estimation of longitudinal or dynamic causal effects.
  • The "longitudinal proximal g-formula" provides a path to identification in settings where the sequential randomization assumption cannot be justified.

5. Applications and Empirical Illustration

SUPPORT Study (Right Heart Catheterization):

  • Treatment: RHC (Yes/No), Outcome: 30-day survival.
  • Multiple measured physiological covariates: 10 are candidate proxies for unmeasured severity.
  • Proxies ZZ and WW allocated based on observed associations.
  • Standard OLS: 1.25-1.25 days (SE 0.28); Proximal 2SLS: 1.80-1.80 days (SE 0.43). Conventional methods understate harm attributable to unmeasured confounding.

Longitudinal Methotrexate Study:

  • Methotrexate therapy in rheumatoid arthritis patients.
  • Time-varying proxies assigned at each visit.
  • Recursive algorithm with linear bridge shows a more protective effect than traditional methods, highlighting PCL's ability to correct for longitudinal latent confounding.

6. Point Treatment vs. Time-Varying Settings and Implementation Considerations

Point Treatment:

  • Proximal identification and estimation as above, with possible application of 2SLS-type algorithms when bridge functions are linear.

Time-Varying:

  • Repeat estimation of proxies and bridge functions at each time point.
  • Backward recursion is used for dynamic treatment regime estimation, with improved robustness to misspecification in early-stage models.

Implementation Considerations:

  • Correct proxy variable classification is critical.
  • Satisfying completeness and independence assumptions is nontrivial and relies on subject-matter expertise.
  • The methodology's robustness to model misspecification depends on the use of recursive estimation and the inclusion of rich, informative proxies.

The Proximal Causal Learning framework generalizes conventional causal inference by enabling nonparametric identification and estimation of causal effects in the presence of unmeasured confounding—provided that proxy variables are available and appropriately leveraged. Through the proximal g-formula and generalized computation algorithms, PCL offers practical tools for both static and dynamic causal questions, quantitatively addressing the pervasive challenge of hidden confounders in observational research. Analyses of real and simulated data demonstrate that PCL often reveals stronger or different effects than conventional methods, directly attributably to its explicit modeling of latent confounding structure.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Proximal Causal Learning (PCL).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube