Causal Sinkhorn DRO
- Causal-SDRO is a framework that unifies causal optimal transport and entropic regularization to address ambiguity in distribution shifts while enforcing temporal or structural causality.
- It formulates both primal and dual optimization problems to yield tractable, interpretable decision rules applicable to generative modeling, policy learning, and portfolio selection.
- Empirical studies show that Causal-SDRO significantly improves out-of-sample prescriptiveness and reduces uncertainty compared to standard methods in ambiguity-averse decision-making.
Causal Sinkhorn Distributionally Robust Optimization (Causal-SDRO) is a theoretical and algorithmic framework that unifies ambiguity-averse decision-making, optimal-transport modeling of distribution shifts, and explicit enforcement of temporal or structural causality in the ambiguity set. By combining causal optimal transport constraints with entropic regularization—operationalized via the Sinkhorn algorithm—Causal-SDRO enables robust optimization and learning procedures that are both computationally tractable and sensitive to the underlying directionality of information flow in observed or temporal data, while guaranteeing interpretability and strong theoretical properties (Zhang et al., 16 Jan 2026, Jiang, 2024, Xu et al., 2020).
1. Causal Sinkhorn Discrepancy and Model Foundations
Causal Sinkhorn Discrepancy (CSD) extends the classical Wasserstein distance by integrating both causal constraint and entropy regularization. For distributions on , CSD is defined as
where:
- is the set of causal couplings enforcing structural constraints, e.g., ,
- is the relative entropy with respect to product reference measures,
- controls the strength of regularization.
Setting recovers the causal-Wasserstein distance; positive ensures the optimizer is absolutely continuous, favoring soft transport plans and facilitating numerically stable optimization (Zhang et al., 16 Jan 2026).
The Causal-SDRO ambiguity set consists of all distributions within a CSD "ball" of radius around the empirical reference, subject to causal consistency, preventing implausible “back-door” couplings and respecting the direction of covariate-outcome relationships.
2. Primal and Dual Formulations
In Causal-SDRO, the robust policy seeks the decision rule that minimizes worst-case expected loss under all distributions within the CSD ambiguity set: where is a loss, is empirical data, and is a class of measurable functions (Zhang et al., 16 Jan 2026).
The strong dual reformulation hinges on Lagrangian relaxation and yields: where the function involves an implicit log-sum-exp over outcomes and encodes the combined loss and transport cost. This duality guarantees no gap and a unique optimal under mild regularity assumptions (Zhang et al., 16 Jan 2026, Jiang, 2024). The corresponding worst-case law is a mixture of Gibbs distributions and remains causally consistent.
3. Causality Constraints and Entropic Regularization
Causality constraints are formally codified so that any feasible coupling must have, at each time or structural node, dependencies consistent with a prescribed filtration or causal graph. In time series, this requires that "the future of cannot influence the law of at any time ": formally, for all ,
where , (Xu et al., 2020).
Entropic regularization, via the negative entropy , renders the optimization strictly convex, enables efficient computation via Sinkhorn-Knopp matrix scaling, and provides soft control over the size of the ambiguity set (Jiang, 2024).
4. Sinkhorn Algorithm and Dynamic Implementation
The entropic-regularized causal transport problem admits a block-coordinate solution analogous to, but more general than, the classical Sinkhorn algorithm. Dual potentials are iteratively updated over the filtration, with each update corresponding to a Bregman (KL) projection that enforces causality: and a symmetric equation for using (Jiang, 2024).
This sequence converges to the optimal dual solution under mild assumptions. This approach generalizes to discrete and continuous dynamics, and is applicable both in generative modeling (e.g., COT-GAN) and in policy learning with covariate-outcome pairs (Xu et al., 2020, Zhang et al., 16 Jan 2026).
5. Adversarial Minimax and Distributional Robustness
Causal-SDRO can be interpreted as performing distributionally robust optimization over all distributions in a causally constrained entropy ball. For generative modeling, this takes the form: where denotes the set of causal couplings, is the generator, and is the latent noise law (Xu et al., 2020). In adversarial formulations, the cost function is augmented with causal-martingale penalties, often parameterized by neural networks and optimized in a min-max game.
The worst-case distribution within the CSD ball is a continuous mixture of Gibbs kernels, explicitly computable from the dual variables and underlying reference measure (Zhang et al., 16 Jan 2026). This structure ensures that, despite regularization, the resulting optimizer is interpretable and always respects the directionality of information transfer.
6. Decision Rule Representation and Tractable Optimization
The Soft Regression Forest (SRF) provides a parametric, differentiable, and interpretable class of decision functions that can approximate optimal policies in Causal-SDRO. Each SRF consists of full-binary trees of depth , with probabilistic splits at each internal node and convex combinations at leaves, yielding a universal, fully tree-structured, and Lipschitz decision rule: with explicit formulas for gradients and smoothness constants (Zhang et al., 16 Jan 2026).
Optimization of SRF-parameterized Causal-SDRO uses a stochastic compositional gradient algorithm, leveraging auxiliary variables and multi-level expectation structure. Under suitable regularity, convergence to an -stationary point occurs in iterations, matching the complexity of standard stochastic methods (Zhang et al., 16 Jan 2026).
7. Empirical Results and Interpretability
Empirical studies on feature-based newsvendor, inventory substitution, and portfolio selection tasks demonstrate that SRF-Causal-SDRO offers superior out-of-sample prescriptiveness and lower uncertainty compared to classical ERM, neural networks, and traditional DRO. For instance, SRF-CSDRO achieves prescriptiveness ≈ 52% versus 10% for standard two-layer networks in nonlinear newsvendor problems, and outperforms equal-weight and conditional mean-variance (CMV) approaches in S&P 500 portfolio selection (Zhang et al., 16 Jan 2026).
Interpretability is intrinsic to SRF: global feature importance correlates highly (0.82) with permutation-based measures, and local attributions (e.g., empirical integrated gradients) match post-hoc SHAP values (correlation 0.99), but can be traced precisely to tree paths without external simulations.
References:
(Zhang et al., 16 Jan 2026) Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach (Jiang, 2024) Duality of causal distributionally robust optimization: the discrete-time case (Xu et al., 2020) COT-GAN: Generating Sequential Data via Causal Optimal Transport