Papers
Topics
Authors
Recent
Search
2000 character limit reached

TRA-C: Causal Inference with Abstention

Updated 7 February 2026
  • TRA-C is a statistical methodology that infers causal direction from observational bivariate data using geometric residual properties and a principled abstention option to avoid spurious claims.
  • It leverages nonparametric copula transformations, rank-based Gaussianization, and a semiparametric bootstrap to calibrate the null model against pure dependence.
  • Empirical evaluations, including on the Tübingen benchmark, demonstrate TRA-C’s ability to control false discovery rates and deliver high accuracy in the presence of latent confounding.

TRA-C (Topological Residual Asymmetry with Confounding-aware Abstention) is a statistical methodology for inferring the causal direction between two variables from purely observational bivariate data. Developed as an extension of the Topological Residual Asymmetry (TRA) framework, TRA-C introduces a principled “abstain” option to account for scenarios where observed dependence may arise from latent confounding rather than a true causal relationship. The approach leverages geometric properties of residual distributions, nonparametric copula transformations, and a semiparametric bootstrap calibrated to the pure-dependence null, delivering robust causal inferences with controlled false discovery rates in the presence of potential confounding (Bouchattaoui, 31 Jan 2026).

1. TRA-C: Definition and Theoretical Motivation

TRA-C augments the geometry-based TRA criterion by adding a statistically calibrated reject/abstain mechanism. The core motivation is to address the tendency of previous causal inference methods—both constraint-based and score-based—to erroneously assert a causal direction even in settings consistent with pure dependence, such as those induced by latent variables or weakly identifiable models.

The standard TRA score is defined as the signed difference of two topological persistence statistics derived from regressor-residual clouds after copula standardization: Δn=TP0[αn,βn](R˚YX(n))TP0[αn,βn](R˚XY(n)),\Delta_n = TP_0^{[\alpha_n, \beta_n]}\big( \mathring{R}^{(n)}_{Y \mid X} \big) - TP_0^{[\alpha_n, \beta_n]}\big( \mathring{R}^{(n)}_{X \mid Y} \big), where TP0[αn,βn]TP_0^{[\alpha_n, \beta_n]} denotes the normalized count of Minimum Spanning Tree (MST) edge lengths in a specified mesoscopic window, operating on residual clouds formed via cross-fitted regression and rank-based marginal Gaussianization. TRA-C evaluates the magnitude Sn=ΔnS_n = |\Delta_n| and assesses whether it is unusually large relative to what is expected under a null model of pure dependence (i.e., potential confounding but no directionality). If the observed asymmetry is not statistically significant, TRA-C abstains from declaring a causal arrow.

2. TRA-C Statistical Decision Rule

TRA-C formalizes its decision rule using a Monte Carlo p-value computed under a calibrated Gaussian-copula bootstrap null:

  • Compute the observed score Sn=ΔnS_n = |\Delta_n| at the chosen mesoscopic window.
  • Generate bootstrap surrogates {Sn(b)}\{ S_n^{*(b)} \} by simulating from the estimated Gaussian copula with empirical margins.
  • Calculate

p^n=1+#{b=1B:Sn(b)Sn}B+1.\hat{p}_n = \frac{1 + \#\{b=1\dots B : S_n^{*(b)} \geq S_n\}}{B+1}.

  • If p^nα\hat{p}_n \leq \alpha (with α\alpha the user-specified significance level, typically 0.10), output the sign of Δn\Delta_n as the inferred direction (XYX \rightarrow Y if Δn>0\Delta_n > 0, YXY \rightarrow X if Δn<0\Delta_n < 0). Otherwise, abstain.

This two-sided thresholding prevents spurious declarations of direction in confounded or near-symmetric regimes. The p-value is computed conditional on the observed Gaussian copula fit, incorporating both the empirical dependency structure and the marginal distributions (Bouchattaoui, 31 Jan 2026).

3. Gaussian-Copula Plug-in Bootstrap Calibration

To instantiate the pure-dependence null used in the above decision rule, TRA-C employs a plug-in Gaussian-copula bootstrap, executed in the following sequence:

  1. Rank-Gaussianization: For each observation,

Z^X,i=Φ1(rank(Xi)n+1),Z^Y,i=Φ1(rank(Yi)n+1),\hat{Z}_{X,i} = \Phi^{-1}\Big( \frac{\mathrm{rank}(X_i)}{n+1} \Big), \qquad \hat{Z}_{Y,i} = \Phi^{-1}\Big( \frac{\mathrm{rank}(Y_i)}{n+1} \Big),

where Φ1\Phi^{-1} is the standard normal quantile function. The empirical copula correlation ρ^n\hat\rho_n is computed from (Z^X,Z^Y)(\hat{Z}_X, \hat{Z}_Y).

  1. Bootstrap Sampling: For each bootstrap sample:
    • Draw nn i.i.d. samples (ZX,i,ZY,i)(Z_{X,i}^*, Z_{Y,i}^*) from N(0,Σ(ρ^n))\mathcal{N}(0, \Sigma(\hat\rho_n)).
    • Map back to uniform scores via (UX,i,UY,i)=(Φ(ZX,i),Φ(ZY,i))(U_{X,i}^*, U_{Y,i}^*) = (\Phi(Z_{X,i}^*), \Phi(Z_{Y,i}^*)).
    • Apply inverse empirical marginals (Xi,Yi)=(F^X1(UX,i),F^Y1(UY,i))(X_i^*, Y_i^*) = (\hat{F}_X^{-1}(U_{X,i}^*), \hat{F}_Y^{-1}(U_{Y,i}^*)).
    • Form dataset Dn(b)D_n^{*(b)} and recompute Sn(b)=Δn(b)S_n^{*(b)} = |\Delta_n^{*(b)}|.
  2. P-value Computation: The Monte Carlo p-value is as given above. The decision to abstain is thus calibrated by the empirical distribution of SnS_n under the estimated copula model.

This procedure operationalizes a no-direction baseline that accurately mimics the dependence structure of the original data but strips away any potential directional signal, allowing rigorous statistical testing of asymmetry beyond pure confounding.

4. Algorithmic Implementation

The practical implementation of TRA-C (Algorithm 1 in (Bouchattaoui, 31 Jan 2026)) is as follows:

1
2
3
4
5
6
7
8
9
Input: Data Dₙ = { (Xᵢ, Yᵢ) }, TRA score function, bootstraps B, level α
1. Compute TRA score Δₙ and Sₙ = |Δₙ|
2. Rank-Gaussianize marginals to get Ẑ_X and Ẑ_Y; estimate ρ̂ₙ
3. For b = 1..B:
    - Sample n i.i.d. (Z_X^*, Z_Y^*) ~ N(0, Σ(ρ̂ₙ))
    - Transform to uniform, then to empirical marginals
    - Compute Sₙ^{*(b)} = |TRA score on bootstrap|
4. Compute empirical p̂ₙ
5. If p̂ₙ > α: abstain; else: return sign(Δₙ) as direction

This workflow ensures that directional inferences are only made when there is statistically significant evidence of asymmetry in residual topology, under a carefully matched null.

5. Theoretical Properties

TRA-C offers rigorous statistical guarantees for abstention calibration. Theorem 3 in (Bouchattaoui, 31 Jan 2026) proves that, under any data-generating process in the Gaussian-copula family with copula parameter ρ0\rho_0, the asymptotic Type I error of wrongly asserting a direction (i.e., not abstaining under pure dependence) satisfies

lim supnPρ0(Rn)α,\limsup_{n} P_{\rho_0}(R_n) \leq \alpha,

where RnR_n denotes the event that TRA-C does not abstain. Thus, under the null hypothesis of no causal direction (pure confounding or symmetric dependence), the frequency of false discoveries vanishes at the nominal level α\alpha as sample size grows. This result is critical for applications where even modest error rates can have substantive consequences.

6. Empirical Evidence and Comparative Performance

A comprehensive empirical evaluation in (Bouchattaoui, 31 Jan 2026) demonstrates TRA-C’s superiority in challenging settings. In simulation studies involving both linear and nonlinear latent confounding, prominent alternatives (RESIT, IGCI, CDCI, RCC, bQCD, SLOPPY) almost always commit to a direction, yielding near-unity coverage when no true arrow exists. TRA-C contrasts by abstaining in the majority of such cases, maintaining coverage rates near the specified α\alpha and exhibiting controlled false call rates.

On the Tübingen benchmark (30 univariate real-world pairs), TRA-C displays the highest accuracy among cases where it decides a direction (often >90%) and the lowest directed risk among all evaluated methods. While TRA-s (the uncalibrated fixed-noise variant) achieves higher coverage, only TRA-C robustly guards against overcalling in ambiguous regimes. TRA-C’s abstention mechanism thus promotes reliability over mere decisiveness.

7. Advantages, Limitations, and Applications

TRA-C’s core strengths are:

  • Robust geometry-based score capturing residual independence structures, invariant under monotonic marginal transformations.
  • Proven consistency for additive-noise models and controlled error rates in confounded or non-identifiable regimes via calibrated abstention.
  • Effective empirical performance across synthetic and real data, even under substantial latent confounding.

A notable consequence is that, in problems where the true data-generating process may involve unmeasured confounders or violate conventional assumptions, TRA-C substantially reduces the risk of spurious causal claims—a limitation endemic to many existing methods.

Primary applications of TRA-C include observational sciences where causal directionality must be inferred without the benefit of interventions or known time-ordering, and where abstaining rather than making unwarranted claims is of substantive value.


For in-depth algorithmic details, theoretical proofs, and extensive empirical results, see "Topological Residual Asymmetry for Bivariate Causal Direction" (Bouchattaoui, 31 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TRA-C.