Nonconvex NegDRO: Robust Causal Invariance
- The paper introduces NegDRO, which leverages negative-weight nonconvex minimax optimization to reliably recover causal outcome models under additive interventions.
- It establishes identifiability conditions ensuring the sole linear predictor with equal risk across environments is the true causal coefficient under strict heterogeneity.
- The proposed gradient-based algorithm scales efficiently in high dimensions, outperforming exhaustive combinatorial searches found in traditional causal discovery methods.
Nonconvex Negative-Weight Distributionally Robust Optimization (NegDRO) is a continuous minimax framework for causal invariance learning, introduced to address causal discovery across heterogenous environments under additive interventions. NegDRO extends classical group distributionally robust optimization by maximizing over weights that may be negative—breaking convexity—yet under appropriate identifiability conditions, it provably recovers the causal outcome model and exhibits strong computational properties. Unlike prior approaches, NegDRO avoids exhaustive combinatorial searches by leveraging nonconvex optimization with theoretical guarantees, scaling efficiently with the number of covariates and maintaining robustness where prior methods fail (Wang et al., 2024).
1. Formulation and Nonconvex Minimax Structure
NegDRO operates on multi-environment data , where each environment presents a squared-loss linear prediction risk for as
To characterize invariance via risk-equalization across environments, NegDRO defines an uncertainty set for environment weights parameterized by :
The central optimization is
With , invariance is strictly enforced: Allowing negative leads to nonconvexity as 0 lacks positive semidefinite curvature in 1 when any 2. However, under specific identifiability conditions, all stationary points are globally optimal. This property distinguishes NegDRO from standard convex-concave DRO formulations (Wang et al., 2024).
2. Identifiability Conditions under Additive Interventions
The identifiability results assume a linear structural equations model (SEM) on 3, with
4
where the environment-specific model is 5. Heterogeneity is introduced by
6
with the requirement 7, so all environment-variation is in 8.
Condition A (strict heterogeneity) prescribes: 9 which is both sufficient and, in the case where each 0 is one-sparse, nearly necessary for identifiability. The only linear predictor that achieves equal risk across environments is the true causal coefficient 1. Thus, 2 under Condition A (Wang et al., 2024).
3. Gradient-Based Optimization Algorithm
To address non-differentiability in the maximization over 3, NegDRO employs a ridge-regularized objective for 4, yielding a differentiable function 5: 6 for small 7. The unique maximizer 8 permits a single-loop algorithm alternating between
- Weight maximization: 9
- Gradient descent in 0: 1, where the gradient is
2
The final estimate 3 is the iterate with minimal 4. A proximal or subgradient-based variant can be applied to the unpenalized objective. This iterative method consistently avoids the exponential cost of exhaustive search present in ICP, EILLS, and related invariant causal discovery approaches (Wang et al., 2024).
4. Theoretical Guarantees
Assuming 5-Lipschitz gradients for each 6 and denoting by 7 the minimal eigenvalue in Condition A and 8 (minimal sample size per environment), NegDRO provides the following guarantees:
- Population bound for any 9: 0
- Finite-sample bound (with high probability): 1
- Stationary-point convergence: For step-size 2 and 3 steps,
4
so
5
Proximal or subgradient variants achieve the 6 rate to a generalized stationary point.
In limited-intervention regimes (only outcome-children perturbed), only a principal submatrix of 7 needs positivity, with degraded rates 8 and much longer required iteration 9. These theoretical results elucidate NegDRO's ability to attain causal identification and robust convergence in both population and finite-sample regimes (Wang et al., 2024).
5. Practical Performance and Empirical Insights
Simulation studies highlight multiple salient aspects of NegDRO's practical efficacy:
- Convergence in 0: For large 1, the error in the estimate scales as 2.
- Sample-size scaling: The estimation error decreases as 3 (empirically, slope ≈ -1/4 on log–log plots).
- High-dimensional scalability: NegDRO solves problems with up to 4 covariates within seconds to minutes. In contrast, exhaustive search methods such as ICP and EILLS fail to complete within 30 minutes for 5–6.
- Robustness to intervention strength: When interventions are limited or weak, classical methods such as CausalDantzig (requires invertible Gram-matrix gaps) and DRIG (requires a reference environment) fail, but NegDRO still recovers 7.
- Negative weights as invariance-enforcing: Allowing 8 enables the optimizer to subtract non-causal environment risks, enforcing risk invariance; despite nonconvexity, simple gradient-based schemes reliably converge globally (Wang et al., 2024).
6. Relation to Prior Work and Significance
NegDRO generalizes classical group DRO, which constrains 9 to the simplex (0), as in Sagawa et al. (Sagawa et al., 2019), but surmounts the limitations posed by convexity. Earlier invariance-based methods (e.g., ICP, EILLS) involve combinatorial searches over covariate subsets with exponential complexity, substantially limiting scalability. CausalDantzig [Rothenhäusler et al., Ann. Stat. 2019] and DRIG further rely on restrictive identifiability conditions (e.g., invertibility or reference environments), failing in weak or limited-intervention settings. NegDRO, by combining negative weighting and nonconvex minimax optimization, achieves polynomial scalability and theoretical recovery guarantees in a broad array of intervention regimes, significantly broadening the applicability of causal invariance approaches (Wang et al., 2024).