Stochastic Dual Dynamic Programming

Updated 11 December 2025

SDDP is an algorithm that constructs piecewise-affine approximations of value functions in large-scale multistage stochastic convex programs.
It employs forward–backward passes to solve Bellman recursions efficiently under convexity, stagewise independence, and related assumptions.
Extensions of SDDP cover risk-averse, Markovian, random-horizon, and nonconvex cases, with applications in energy planning, portfolio optimization, and supply chain management.

Stochastic Dual Dynamic Programming (SDDP) is a cutting-plane approximate dynamic programming algorithm designed to efficiently solve large-scale multistage stochastic convex programs, particularly those with high-dimensional state-space, linking recourse, or stagewise-coupled uncertainties. SDDP proceeds via a two-pass (forward–backward) iterative scheme, constructing a piecewise-affine lower bounding model of the value (cost-to-go) functions through the accumulation of Benders-type cuts. This algorithm operates under fundamental structural assumptions such as convexity, relatively complete recourse, and—in the standard formulation—stagewise independence of uncertainty. Over the past decade, SDDP's core methodology has been extended to cover nonlinear, risk-averse, nonconvex, Markovian, regularized, inexact, and infinite-horizon formulations. SDDP and its variants have seen extensive adoption in energy planning, portfolio optimization, supply chain, and infrastructure management.

1. Mathematical Formulation and Bellman Recursion

The canonical SDDP setting is the $T$ -stage risk-neutral multistage stochastic convex program: $\min_{x_1\in X_1} \;\mathbb{E}_{\xi_1}\Bigl[f_1(x_1,\xi_1) + \mathbb{E}_{\xi_2}\bigl[\min_{x_2\in X_2(x_1,\xi_2)} f_2(x_2,\xi_2) + \cdots \bigr] \Bigr]$ where at each stage $t$ :

$\xi_t$ is the exogenous random data (discrete or continuous, often stagewise independent).
$x_t \in X_t(x_{t-1},\xi_t)$ is the recourse/control variable, with $x_0$ given.
$f_t$ is the stage cost; all sets and functions are convex.

The dynamic programming recursion, or Bellman equations, take the form: $V_T(x_{T-1}) = \mathbb{E}_{\xi_T}\Bigl[\min_{x_T\in X_T(x_{T-1},\xi_T)} f_T(x_T,\xi_T)\Bigr]$

$V_t(x_{t-1}) = \mathbb{E}_{\xi_t} \Bigl[\min_{x_t\in X_t(x_{t-1},\xi_t)} \Bigl\{f_t(x_t,\xi_t)+ V_{t+1}(x_t)\Bigr\}\Bigr],\quad t = T-1,\dots,1$

where $V_t(\cdot)$ are the cost-to-go functions, convex and polyhedral under suitable assumptions (Lan, 2019, Pacaud et al., 2022).

In random-horizon extensions, a stopping time $T:\Omega\to\{2,\dots,T_{\max}\}$ is introduced, leading to augmented Bellman recursions with a "death" indicator $D_{t-1}$ and cost modification $D_{t-1} f_t(x_t,x_{t-1},\xi_t)$ (Guigues, 2018).

2. SDDP Algorithm: Forward–Backward Passes and Cut Management

SDDP operates by iteratively constructing polyhedral underestimators for the value functions $V_t$ . The core steps at iteration $k$ are:

Forward Pass:

Sample a scenario (or multiple in parallel), constituting a realization $(\tilde\xi_1,\dots,\tilde\xi_T)$ (and indicator process for random-horizon).
For $t=1,\dots,T$ recursively solve:

$x_t^k = \arg\min_{x\in X_t(x_{t-1}^k,\tilde\xi_t)} f_t(x,x_{t-1}^k,\tilde\xi_t) + V_{t+1}^{k-1}(x)$

Record the trial states $x_t^k$ and costs.

Backward Pass:

For $t=T,\dots,2$ $t = T, \dots, 2$ (in reverse), and for each scenario in the support of $\xi_t$ $ξ_{t}$ ,
- Solve the subproblem:
$Q_t^k(x_{t-1}^k,\xi_{tj}) = \min_{x_t\in X_t(x_{t-1}^k,\xi_{tj})} f_t(x_t,x_{t-1}^k,\xi_{tj}) + V_{t+1}^k(x_t)$ - Obtain optimal dual multipliers and compute a subgradient $\beta_{tj}^k$ ; set the intercept $\theta_{tj}^k$ .
Aggregate cuts (by probability weighting) and update $V_t^k$ as the maximum of all existing and new cuts:

$V_t^k(x) = \max\left\{ V_t^{k-1}(x),\; \theta_t^k + \langle\beta_t^k, x-x_{t-1}^k\rangle \right\}$

This process is repeated until the lower/upper bound gap falls below a pre-specified tolerance or a stopping rule is satisfied (Lan, 2019, Vassos et al., 3 May 2025, Pacaud et al., 2022).

3. Extensions, Variants, and Theoretical Properties

3.1 Conditional Cuts and Markovian Dynamics

When the stagewise independence assumption is relaxed (e.g., $\{\xi_t\}$ forms a Markov chain), SDDP with conditional cuts estimates the coefficients $\alpha_t(\xi_{t-1})$ and $\beta_t(\xi_{t-1})$ as functions of the previous stage's uncertainty. These coefficients are computed as conditional expectations, typically via regression over a suitable function basis (e.g., local affine regression), naturally accommodating path-dependence without inflating the physical state-space (Van-Ackooij et al., 2017). This approach retains the decomposition and tractability properties of SDDP and achieves almost-sure convergence, provided the regression error vanishes asymptotically.

3.2 Random-Horizon Problems

For processes with a random stopping time $T$ , SDDP is adapted by augmenting the state with a Boolean death-indicator $D_{t-1}$ , resulting in cut approximations for both $(x,D_{t-1})$ , with recourse costs nullified beyond the stopping event. The random-horizon SDDP requires generation of cuts at all stages up to $T_{\max}$ but typically results in faster convergence and CPU savings, as forward passes are often truncated early (Guigues, 2018).

3.3 Complexity, Regularization, and Inexactness

Theoretical analysis shows that SDDP's iteration complexity grows polynomially with the number of stages $T$ (linear in $T$ for discounted problems), but exponentially with the per-stage state dimension $n_t$ (Lan, 2019). Regularization schemes (e.g., SDDP-REG), which introduce stage- or iteration-dependent proximal terms in the forward pass, have demonstrated significant performance improvements by stabilizing trial points and accelerating convergence (Guigues et al., 2017). Inexact SDDP and related variants allow primal and dual subproblem solves to be carried out up to bounded or vanishing errors, maintaining convergence guarantees with explicit error bounds (Guigues, 2018).

3.4 Nonlinear, Mixed-Integer, and Infinite Horizon

SDDP has been generalized to convex nonlinear stochastic programs (using subgradients from convex duality), multistage stochastic mixed-integer nonlinear programs with generalized conjugacy cuts (ensuring global optimality in subclasses), and infinite-horizon decision processes via continual exploration and updating of the cutting-plane model (Guigues et al., 2019, Zhang et al., 2019, Ju et al., 2023). For strongly convex recourse, quadratic cuts yield improved convergence properties (Guigues et al., 8 Jun 2025).

3.5 Dual and Risk-Averse Formulations

Dual SDDP extends the primal framework to risk-averse programs by constructing upper approximations to the value functions via a dual Bellman–recourse recursion. This approach, particularly under coherent polyhedral risk measures, provides deterministic upper bounds converging to the true optimal value and is particularly relevant for risk-averse stochastic scheduling and planning (Costa et al., 2021).

4. Practical Implementation and Scalability Considerations

SDDP has demonstrated scalability to large-scale, high-dimensional problems, e.g., multistage container logistics, power systems planning, and microgrid control (Vassos et al., 3 May 2025, Pacaud et al., 2022). Practical implementation guidelines include:

Efficient representation and updating of cut bundles.
On-the-fly scenario generation with copula-based sampling to capture correlated uncertainties.
Warm-starting and dual solution caching to accelerate LP or convex program solves.
Block-triangular exploitation in network-structured optimization problems.
Adaptive stopping criteria that track true policy quality improvements via regret or upper-lower gap plateaus.

In large industrial settings, SDDP can be run with partition-based (aggregate) cut generation to reduce early-iteration computational burden, progressively refining scenario clusters only as required (Siddig et al., 2019).

5. Applications, Empirical Results, and Limitations

SDDP and its variants have been applied to multistage portfolio optimization (both with fixed and random horizons), supply chain management, stochastic optimal control for microgrids, and hydrothermal scheduling (Guigues, 2018, Vassos et al., 3 May 2025, Pacaud et al., 2022, Guigues et al., 2019).

Key empirical findings include:

Random-horizon SDDP policies yield significant out-of-sample gains (1–10% higher mean returns) and are substantially faster than deterministic-horizon SDDP due to early termination of forward simulations (Guigues, 2018).
In supply chain and logistics, SDDP scales to hundreds of variables and constraints per stage, with regret and policy-quality plateauing rapidly and only inflow uncertainty substantially impacting cost (Vassos et al., 3 May 2025).
For microgrid energy management, SDDP outperforms model predictive control (MPC) in both average and per-scenario cost, with negligible online policy evaluation time per stage (Pacaud et al., 2022).

Main limitations of standard SDDP include dependence on stagewise independence (unless extended as above), requirement for convex/linear recourse (unless using advanced cut types), and exponential iteration counts with state dimension. Discreteness or tractable discretization is needed for certain random-horizon and scenario-based formulations (Guigues, 2018).

6. Summary Table: SDDP Algorithmic Features and Variants

Variant	Key Extension	Technical Tools	Primary Papers
Classical SDDP	Polyhedral cuts, i.i.d.	Primal-dual LP, convexity, Benders	(Lan, 2019, Pacaud et al., 2022)
SDDP–Random Horizon	Stopping time, state augment	Death-indicator, cut splitting	(Guigues, 2018)
Conditional Cuts SDDP	Markovian uncertainty	Functional regression, conditional exp	(Van-Ackooij et al., 2017)
SDDP-REG	Regularization	Proximal penalty in forward pass	(Guigues et al., 2017)
Inexact SDDP	Approximate solves	Inexact cuts, primal/dual relaxation	(Guigues, 2018)
StoDCuP	Nonlinear convex recourse	Bundle cuts for cost/constraint	(Guigues et al., 2019)
Dual SDDP	Risk-averse, dual recursion	Piecewise-linear upper bounding	(Costa et al., 2021)
SQDP	Strong convexity, quadratic	Quadratic cuts for value functions	(Guigues et al., 8 Jun 2025)

Each variant, while maintaining SDDP’s core forward–backward, cut-based recursion, is tailored to problem structure, uncertainty model, solution tolerances, and computational constraints.

References:

(Guigues, 2018, Van-Ackooij et al., 2017, Akian et al., 2018, Vassos et al., 3 May 2025, Guigues et al., 2019, Pacaud et al., 2022, Zhang et al., 2019, Lan, 2019, Costa et al., 2021, Dai et al., 2021, Siddig et al., 2019, Guigues et al., 2017, Guigues et al., 8 Jun 2025, Ju et al., 2023, Lebedev et al., 2020, Guigues, 2018)