Nonconvex NegDRO: Robust Causal Invariance

Updated 10 June 2026

The paper introduces NegDRO, which leverages negative-weight nonconvex minimax optimization to reliably recover causal outcome models under additive interventions.
It establishes identifiability conditions ensuring the sole linear predictor with equal risk across environments is the true causal coefficient under strict heterogeneity.
The proposed gradient-based algorithm scales efficiently in high dimensions, outperforming exhaustive combinatorial searches found in traditional causal discovery methods.

Nonconvex Negative-Weight Distributionally Robust Optimization (NegDRO) is a continuous minimax framework for causal invariance learning, introduced to address causal discovery across heterogenous environments under additive interventions. NegDRO extends classical group distributionally robust optimization by maximizing over weights that may be negative—breaking convexity—yet under appropriate identifiability conditions, it provably recovers the causal outcome model and exhibits strong computational properties. Unlike prior approaches, NegDRO avoids exhaustive combinatorial searches by leveraging nonconvex optimization with theoretical guarantees, scaling efficiently with the number of covariates and maintaining robustness where prior methods fail (Wang et al., 2024).

1. Formulation and Nonconvex Minimax Structure

NegDRO operates on multi-environment data $\mathcal{E} = \{1, \dots, E\}$ , where each environment $e$ presents a squared-loss linear prediction risk for $b\in\mathbb{R}^p$ as

$R_e(b) = \mathbb{E}\bigl[(Y^{(e)} - b^\top X^{(e)})^2\bigr].$

To characterize invariance via risk-equalization across environments, NegDRO defines an uncertainty set for environment weights parameterized by $\gamma \ge 0$ :

$\mathcal{U}(\gamma) = \left\{w \in \mathbb{R}^E \,:\, \sum_{e=1}^E w_e = 1,\; \min_e w_e \ge -\gamma\right\}.$

The central optimization is

$b_{\mathrm{Neg}}^\gamma = \underset{b \in \mathbb{R}^p}{\arg\min} \underset{w \in \mathcal{U}(\gamma)}{\max} \sum_{e=1}^E w_e R_e(b). \tag{1}$

With $\gamma \to \infty$ , invariance is strictly enforced: $b_{\mathrm{Neg}}^\infty = \underset{b: R_1(b)=\cdots=R_E(b)}{\arg\min} R_e(b). \tag{2}$ Allowing negative $w_e$ leads to nonconvexity as $e$ 0 lacks positive semidefinite curvature in $e$ 1 when any $e$ 2. However, under specific identifiability conditions, all stationary points are globally optimal. This property distinguishes NegDRO from standard convex-concave DRO formulations (Wang et al., 2024).

2. Identifiability Conditions under Additive Interventions

The identifiability results assume a linear structural equations model (SEM) on $e$ 3, with

$e$ 4

where the environment-specific model is $e$ 5. Heterogeneity is introduced by

$e$ 6

with the requirement $e$ 7, so all environment-variation is in $e$ 8.

Condition A (strict heterogeneity) prescribes: $e$ 9 which is both sufficient and, in the case where each $b\in\mathbb{R}^p$ 0 is one-sparse, nearly necessary for identifiability. The only linear predictor that achieves equal risk across environments is the true causal coefficient $b\in\mathbb{R}^p$ 1. Thus, $b\in\mathbb{R}^p$ 2 under Condition A (Wang et al., 2024).

3. Gradient-Based Optimization Algorithm

To address non-differentiability in the maximization over $b\in\mathbb{R}^p$ 3, NegDRO employs a ridge-regularized objective for $b\in\mathbb{R}^p$ 4, yielding a differentiable function $b\in\mathbb{R}^p$ 5: $b\in\mathbb{R}^p$ 6 for small $b\in\mathbb{R}^p$ 7. The unique maximizer $b\in\mathbb{R}^p$ 8 permits a single-loop algorithm alternating between

Weight maximization: $b\in\mathbb{R}^p$ 9
Gradient descent in $R_e(b) = \mathbb{E}\bigl[(Y^{(e)} - b^\top X^{(e)})^2\bigr].$ 0: $R_e(b) = \mathbb{E}\bigl[(Y^{(e)} - b^\top X^{(e)})^2\bigr].$ 1, where the gradient is

$R_e(b) = \mathbb{E}\bigl[(Y^{(e)} - b^\top X^{(e)})^2\bigr].$ 2

The final estimate $R_e(b) = \mathbb{E}\bigl[(Y^{(e)} - b^\top X^{(e)})^2\bigr].$ 3 is the iterate with minimal $R_e(b) = \mathbb{E}\bigl[(Y^{(e)} - b^\top X^{(e)})^2\bigr].$ 4. A proximal or subgradient-based variant can be applied to the unpenalized objective. This iterative method consistently avoids the exponential cost of exhaustive search present in ICP, EILLS, and related invariant causal discovery approaches (Wang et al., 2024).

4. Theoretical Guarantees

Assuming $R_e(b) = \mathbb{E}\bigl[(Y^{(e)} - b^\top X^{(e)})^2\bigr].$ 5-Lipschitz gradients for each $R_e(b) = \mathbb{E}\bigl[(Y^{(e)} - b^\top X^{(e)})^2\bigr].$ 6 and denoting by $R_e(b) = \mathbb{E}\bigl[(Y^{(e)} - b^\top X^{(e)})^2\bigr].$ 7 the minimal eigenvalue in Condition A and $R_e(b) = \mathbb{E}\bigl[(Y^{(e)} - b^\top X^{(e)})^2\bigr].$ 8 (minimal sample size per environment), NegDRO provides the following guarantees:

Population bound for any $R_e(b) = \mathbb{E}\bigl[(Y^{(e)} - b^\top X^{(e)})^2\bigr].$ 9: $\gamma \ge 0$ 0
Finite-sample bound (with high probability): $\gamma \ge 0$ 1
Stationary-point convergence: For step-size $\gamma \ge 0$ 2 and $\gamma \ge 0$ 3 steps,

$\gamma \ge 0$ 4

$\gamma \ge 0$ 5

Proximal or subgradient variants achieve the $\gamma \ge 0$ 6 rate to a generalized stationary point.

In limited-intervention regimes (only outcome-children perturbed), only a principal submatrix of $\gamma \ge 0$ 7 needs positivity, with degraded rates $\gamma \ge 0$ 8 and much longer required iteration $\gamma \ge 0$ 9. These theoretical results elucidate NegDRO's ability to attain causal identification and robust convergence in both population and finite-sample regimes (Wang et al., 2024).

5. Practical Performance and Empirical Insights

Simulation studies highlight multiple salient aspects of NegDRO's practical efficacy:

Convergence in $\mathcal{U}(\gamma) = \left\{w \in \mathbb{R}^E \,:\, \sum_{e=1}^E w_e = 1,\; \min_e w_e \ge -\gamma\right\}.$ 0: For large $\mathcal{U}(\gamma) = \left\{w \in \mathbb{R}^E \,:\, \sum_{e=1}^E w_e = 1,\; \min_e w_e \ge -\gamma\right\}.$ 1, the error in the estimate scales as $\mathcal{U}(\gamma) = \left\{w \in \mathbb{R}^E \,:\, \sum_{e=1}^E w_e = 1,\; \min_e w_e \ge -\gamma\right\}.$ 2.
Sample-size scaling: The estimation error decreases as $\mathcal{U}(\gamma) = \left\{w \in \mathbb{R}^E \,:\, \sum_{e=1}^E w_e = 1,\; \min_e w_e \ge -\gamma\right\}.$ 3 (empirically, slope ≈ -1/4 on log–log plots).
High-dimensional scalability: NegDRO solves problems with up to $\mathcal{U}(\gamma) = \left\{w \in \mathbb{R}^E \,:\, \sum_{e=1}^E w_e = 1,\; \min_e w_e \ge -\gamma\right\}.$ 4 covariates within seconds to minutes. In contrast, exhaustive search methods such as ICP and EILLS fail to complete within 30 minutes for $\mathcal{U}(\gamma) = \left\{w \in \mathbb{R}^E \,:\, \sum_{e=1}^E w_e = 1,\; \min_e w_e \ge -\gamma\right\}.$ 5– $\mathcal{U}(\gamma) = \left\{w \in \mathbb{R}^E \,:\, \sum_{e=1}^E w_e = 1,\; \min_e w_e \ge -\gamma\right\}.$ 6.
Robustness to intervention strength: When interventions are limited or weak, classical methods such as CausalDantzig (requires invertible Gram-matrix gaps) and DRIG (requires a reference environment) fail, but NegDRO still recovers $\mathcal{U}(\gamma) = \left\{w \in \mathbb{R}^E \,:\, \sum_{e=1}^E w_e = 1,\; \min_e w_e \ge -\gamma\right\}.$ 7.
Negative weights as invariance-enforcing: Allowing $\mathcal{U}(\gamma) = \left\{w \in \mathbb{R}^E \,:\, \sum_{e=1}^E w_e = 1,\; \min_e w_e \ge -\gamma\right\}.$ 8 enables the optimizer to subtract non-causal environment risks, enforcing risk invariance; despite nonconvexity, simple gradient-based schemes reliably converge globally (Wang et al., 2024).

6. Relation to Prior Work and Significance

NegDRO generalizes classical group DRO, which constrains $\mathcal{U}(\gamma) = \left\{w \in \mathbb{R}^E \,:\, \sum_{e=1}^E w_e = 1,\; \min_e w_e \ge -\gamma\right\}.$ 9 to the simplex ( $b_{\mathrm{Neg}}^\gamma = \underset{b \in \mathbb{R}^p}{\arg\min} \underset{w \in \mathcal{U}(\gamma)}{\max} \sum_{e=1}^E w_e R_e(b). \tag{1}$ 0), as in Sagawa et al. (Sagawa et al., 2019), but surmounts the limitations posed by convexity. Earlier invariance-based methods (e.g., ICP, EILLS) involve combinatorial searches over covariate subsets with exponential complexity, substantially limiting scalability. CausalDantzig [Rothenhäusler et al., Ann. Stat. 2019] and DRIG further rely on restrictive identifiability conditions (e.g., invertibility or reference environments), failing in weak or limited-intervention settings. NegDRO, by combining negative weighting and nonconvex minimax optimization, achieves polynomial scalability and theoretical recovery guarantees in a broad array of intervention regimes, significantly broadening the applicability of causal invariance approaches (Wang et al., 2024).

Markdown Report Issue Upgrade to Chat

References (2)

Causal Invariance Learning via Efficient Optimization of a Nonconvex Objective (2024)

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Nonconvex Negative-Weight Distributionally Robust Optimization (NegDRO).

Nonconvex NegDRO: Robust Causal Invariance

1. Formulation and Nonconvex Minimax Structure

2. Identifiability Conditions under Additive Interventions

3. Gradient-Based Optimization Algorithm

4. Theoretical Guarantees

5. Practical Performance and Empirical Insights

6. Relation to Prior Work and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Nonconvex NegDRO: Robust Causal Invariance

1. Formulation and Nonconvex Minimax Structure

2. Identifiability Conditions under Additive Interventions

3. Gradient-Based Optimization Algorithm

4. Theoretical Guarantees

5. Practical Performance and Empirical Insights

6. Relation to Prior Work and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research