TRA-C: Causal Inference with Abstention
- TRA-C is a statistical methodology that infers causal direction from observational bivariate data using geometric residual properties and a principled abstention option to avoid spurious claims.
- It leverages nonparametric copula transformations, rank-based Gaussianization, and a semiparametric bootstrap to calibrate the null model against pure dependence.
- Empirical evaluations, including on the Tübingen benchmark, demonstrate TRA-C’s ability to control false discovery rates and deliver high accuracy in the presence of latent confounding.
TRA-C (Topological Residual Asymmetry with Confounding-aware Abstention) is a statistical methodology for inferring the causal direction between two variables from purely observational bivariate data. Developed as an extension of the Topological Residual Asymmetry (TRA) framework, TRA-C introduces a principled “abstain” option to account for scenarios where observed dependence may arise from latent confounding rather than a true causal relationship. The approach leverages geometric properties of residual distributions, nonparametric copula transformations, and a semiparametric bootstrap calibrated to the pure-dependence null, delivering robust causal inferences with controlled false discovery rates in the presence of potential confounding (Bouchattaoui, 31 Jan 2026).
1. TRA-C: Definition and Theoretical Motivation
TRA-C augments the geometry-based TRA criterion by adding a statistically calibrated reject/abstain mechanism. The core motivation is to address the tendency of previous causal inference methods—both constraint-based and score-based—to erroneously assert a causal direction even in settings consistent with pure dependence, such as those induced by latent variables or weakly identifiable models.
The standard TRA score is defined as the signed difference of two topological persistence statistics derived from regressor-residual clouds after copula standardization: where denotes the normalized count of Minimum Spanning Tree (MST) edge lengths in a specified mesoscopic window, operating on residual clouds formed via cross-fitted regression and rank-based marginal Gaussianization. TRA-C evaluates the magnitude and assesses whether it is unusually large relative to what is expected under a null model of pure dependence (i.e., potential confounding but no directionality). If the observed asymmetry is not statistically significant, TRA-C abstains from declaring a causal arrow.
2. TRA-C Statistical Decision Rule
TRA-C formalizes its decision rule using a Monte Carlo p-value computed under a calibrated Gaussian-copula bootstrap null:
- Compute the observed score at the chosen mesoscopic window.
- Generate bootstrap surrogates by simulating from the estimated Gaussian copula with empirical margins.
- Calculate
- If (with the user-specified significance level, typically 0.10), output the sign of as the inferred direction ( if , if ). Otherwise, abstain.
This two-sided thresholding prevents spurious declarations of direction in confounded or near-symmetric regimes. The p-value is computed conditional on the observed Gaussian copula fit, incorporating both the empirical dependency structure and the marginal distributions (Bouchattaoui, 31 Jan 2026).
3. Gaussian-Copula Plug-in Bootstrap Calibration
To instantiate the pure-dependence null used in the above decision rule, TRA-C employs a plug-in Gaussian-copula bootstrap, executed in the following sequence:
- Rank-Gaussianization: For each observation,
where is the standard normal quantile function. The empirical copula correlation is computed from .
- Bootstrap Sampling: For each bootstrap sample:
- Draw i.i.d. samples from .
- Map back to uniform scores via .
- Apply inverse empirical marginals .
- Form dataset and recompute .
- P-value Computation: The Monte Carlo p-value is as given above. The decision to abstain is thus calibrated by the empirical distribution of under the estimated copula model.
This procedure operationalizes a no-direction baseline that accurately mimics the dependence structure of the original data but strips away any potential directional signal, allowing rigorous statistical testing of asymmetry beyond pure confounding.
4. Algorithmic Implementation
The practical implementation of TRA-C (Algorithm 1 in (Bouchattaoui, 31 Jan 2026)) is as follows:
1 2 3 4 5 6 7 8 9 |
Input: Data Dₙ = { (Xᵢ, Yᵢ) }, TRA score function, bootstraps B, level α
1. Compute TRA score Δₙ and Sₙ = |Δₙ|
2. Rank-Gaussianize marginals to get Ẑ_X and Ẑ_Y; estimate ρ̂ₙ
3. For b = 1..B:
- Sample n i.i.d. (Z_X^*, Z_Y^*) ~ N(0, Σ(ρ̂ₙ))
- Transform to uniform, then to empirical marginals
- Compute Sₙ^{*(b)} = |TRA score on bootstrap|
4. Compute empirical p̂ₙ
5. If p̂ₙ > α: abstain; else: return sign(Δₙ) as direction |
This workflow ensures that directional inferences are only made when there is statistically significant evidence of asymmetry in residual topology, under a carefully matched null.
5. Theoretical Properties
TRA-C offers rigorous statistical guarantees for abstention calibration. Theorem 3 in (Bouchattaoui, 31 Jan 2026) proves that, under any data-generating process in the Gaussian-copula family with copula parameter , the asymptotic Type I error of wrongly asserting a direction (i.e., not abstaining under pure dependence) satisfies
where denotes the event that TRA-C does not abstain. Thus, under the null hypothesis of no causal direction (pure confounding or symmetric dependence), the frequency of false discoveries vanishes at the nominal level as sample size grows. This result is critical for applications where even modest error rates can have substantive consequences.
6. Empirical Evidence and Comparative Performance
A comprehensive empirical evaluation in (Bouchattaoui, 31 Jan 2026) demonstrates TRA-C’s superiority in challenging settings. In simulation studies involving both linear and nonlinear latent confounding, prominent alternatives (RESIT, IGCI, CDCI, RCC, bQCD, SLOPPY) almost always commit to a direction, yielding near-unity coverage when no true arrow exists. TRA-C contrasts by abstaining in the majority of such cases, maintaining coverage rates near the specified and exhibiting controlled false call rates.
On the Tübingen benchmark (30 univariate real-world pairs), TRA-C displays the highest accuracy among cases where it decides a direction (often >90%) and the lowest directed risk among all evaluated methods. While TRA-s (the uncalibrated fixed-noise variant) achieves higher coverage, only TRA-C robustly guards against overcalling in ambiguous regimes. TRA-C’s abstention mechanism thus promotes reliability over mere decisiveness.
7. Advantages, Limitations, and Applications
TRA-C’s core strengths are:
- Robust geometry-based score capturing residual independence structures, invariant under monotonic marginal transformations.
- Proven consistency for additive-noise models and controlled error rates in confounded or non-identifiable regimes via calibrated abstention.
- Effective empirical performance across synthetic and real data, even under substantial latent confounding.
A notable consequence is that, in problems where the true data-generating process may involve unmeasured confounders or violate conventional assumptions, TRA-C substantially reduces the risk of spurious causal claims—a limitation endemic to many existing methods.
Primary applications of TRA-C include observational sciences where causal directionality must be inferred without the benefit of interventions or known time-ordering, and where abstaining rather than making unwarranted claims is of substantive value.
For in-depth algorithmic details, theoretical proofs, and extensive empirical results, see "Topological Residual Asymmetry for Bivariate Causal Direction" (Bouchattaoui, 31 Jan 2026).