Estimating causal vs non-causal deviations and optimal source-domain selection

Determine practical procedures to estimate the Causal Deviation Δ_c and the Non-Causal Deviation Δ_{nc} in the constrained optimization-based causal discovery framework for stock prediction, where Δ_c and Δ_{nc} are defined respectively as the minimal average squared deviation across source domains of the domain-wise least-squares regression coefficients for features φ(X) that correspond to causal features and for features φ(X) that correspond to non-causal features; and identify the optimal selection and partitioning of source domains (i.e., training horizon and domain construction) that yields favorable values of Δ_c and Δ_{nc} to enable reliable discovery of causal features.

Background

In the causal discovery section, the authors introduce a constrained optimization approach to learn invariant feature representations φ(X) that maintain a stable linear relationship with standardized returns across domains. They define two key quantities: the Causal Deviation Δc and the Non-Causal Deviation Δ{nc}, which quantify, respectively, the variance across source domains of the least-squares coefficients associated with causal and non-causal (spurious) features.

Proposition 4.1 shows that if Δc < Δ{nc}, the optimization discovers generalizable causal features under suitable bounds. However, the authors note that both Δc and Δ{nc} depend on the choice of training horizon and how source domains are constructed, and that estimating these quantities or finding the best domain configuration is non-trivial. They explicitly mark this as an open question, highlighting the need for methodological development to estimate these deviations and design domain selection strategies that facilitate correct causal discovery.

References

However, it is non-trivial to estimate their values or find the best collection of domains. We leave this as an open question, and in this work, we assume that Δ_{nc} is low considering the large number of domains in our settings.

A Causal Perspective of Stock Prediction Models (2503.20987 - Xu et al., 26 Mar 2025) in Section 4.1 (Discovering (δ_f, δ_ε)-Generalizable Causal Features)