Papers
Topics
Authors
Recent
2000 character limit reached

Stochastic Dual Dynamic Programming

Updated 11 December 2025
  • SDDP is an algorithm that constructs piecewise-affine approximations of value functions in large-scale multistage stochastic convex programs.
  • It employs forward–backward passes to solve Bellman recursions efficiently under convexity, stagewise independence, and related assumptions.
  • Extensions of SDDP cover risk-averse, Markovian, random-horizon, and nonconvex cases, with applications in energy planning, portfolio optimization, and supply chain management.

Stochastic Dual Dynamic Programming (SDDP) is a cutting-plane approximate dynamic programming algorithm designed to efficiently solve large-scale multistage stochastic convex programs, particularly those with high-dimensional state-space, linking recourse, or stagewise-coupled uncertainties. SDDP proceeds via a two-pass (forward–backward) iterative scheme, constructing a piecewise-affine lower bounding model of the value (cost-to-go) functions through the accumulation of Benders-type cuts. This algorithm operates under fundamental structural assumptions such as convexity, relatively complete recourse, and—in the standard formulation—stagewise independence of uncertainty. Over the past decade, SDDP's core methodology has been extended to cover nonlinear, risk-averse, nonconvex, Markovian, regularized, inexact, and infinite-horizon formulations. SDDP and its variants have seen extensive adoption in energy planning, portfolio optimization, supply chain, and infrastructure management.

1. Mathematical Formulation and Bellman Recursion

The canonical SDDP setting is the TT-stage risk-neutral multistage stochastic convex program: minx1X1  Eξ1[f1(x1,ξ1)+Eξ2[minx2X2(x1,ξ2)f2(x2,ξ2)+]]\min_{x_1\in X_1} \;\mathbb{E}_{\xi_1}\Bigl[f_1(x_1,\xi_1) + \mathbb{E}_{\xi_2}\bigl[\min_{x_2\in X_2(x_1,\xi_2)} f_2(x_2,\xi_2) + \cdots \bigr] \Bigr] where at each stage tt:

  • ξt\xi_t is the exogenous random data (discrete or continuous, often stagewise independent).
  • xtXt(xt1,ξt)x_t \in X_t(x_{t-1},\xi_t) is the recourse/control variable, with x0x_0 given.
  • ftf_t is the stage cost; all sets and functions are convex.

The dynamic programming recursion, or Bellman equations, take the form: VT(xT1)=EξT[minxTXT(xT1,ξT)fT(xT,ξT)]V_T(x_{T-1}) = \mathbb{E}_{\xi_T}\Bigl[\min_{x_T\in X_T(x_{T-1},\xi_T)} f_T(x_T,\xi_T)\Bigr]

Vt(xt1)=Eξt[minxtXt(xt1,ξt){ft(xt,ξt)+Vt+1(xt)}],t=T1,,1V_t(x_{t-1}) = \mathbb{E}_{\xi_t} \Bigl[\min_{x_t\in X_t(x_{t-1},\xi_t)} \Bigl\{f_t(x_t,\xi_t)+ V_{t+1}(x_t)\Bigr\}\Bigr],\quad t = T-1,\dots,1

where Vt()V_t(\cdot) are the cost-to-go functions, convex and polyhedral under suitable assumptions (Lan, 2019, Pacaud et al., 2022).

In random-horizon extensions, a stopping time T:Ω{2,,Tmax}T:\Omega\to\{2,\dots,T_{\max}\} is introduced, leading to augmented Bellman recursions with a "death" indicator Dt1D_{t-1} and cost modification Dt1ft(xt,xt1,ξt)D_{t-1} f_t(x_t,x_{t-1},\xi_t) (Guigues, 2018).

2. SDDP Algorithm: Forward–Backward Passes and Cut Management

SDDP operates by iteratively constructing polyhedral underestimators for the value functions VtV_t. The core steps at iteration kk are:

Forward Pass:

  • Sample a scenario (or multiple in parallel), constituting a realization (ξ~1,,ξ~T)(\tilde\xi_1,\dots,\tilde\xi_T) (and indicator process for random-horizon).
  • For t=1,,Tt=1,\dots,T recursively solve:

xtk=argminxXt(xt1k,ξ~t)ft(x,xt1k,ξ~t)+Vt+1k1(x)x_t^k = \arg\min_{x\in X_t(x_{t-1}^k,\tilde\xi_t)} f_t(x,x_{t-1}^k,\tilde\xi_t) + V_{t+1}^{k-1}(x)

  • Record the trial states xtkx_t^k and costs.

Backward Pass:

  • For t=T,,2t=T,\dots,2 (in reverse), and for each scenario in the support of ξt\xi_t,

    • Solve the subproblem:

    Qtk(xt1k,ξtj)=minxtXt(xt1k,ξtj)ft(xt,xt1k,ξtj)+Vt+1k(xt)Q_t^k(x_{t-1}^k,\xi_{tj}) = \min_{x_t\in X_t(x_{t-1}^k,\xi_{tj})} f_t(x_t,x_{t-1}^k,\xi_{tj}) + V_{t+1}^k(x_t) - Obtain optimal dual multipliers and compute a subgradient βtjk\beta_{tj}^k; set the intercept θtjk\theta_{tj}^k.

  • Aggregate cuts (by probability weighting) and update VtkV_t^k as the maximum of all existing and new cuts:

Vtk(x)=max{Vtk1(x),  θtk+βtk,xxt1k}V_t^k(x) = \max\left\{ V_t^{k-1}(x),\; \theta_t^k + \langle\beta_t^k, x-x_{t-1}^k\rangle \right\}

This process is repeated until the lower/upper bound gap falls below a pre-specified tolerance or a stopping rule is satisfied (Lan, 2019, Vassos et al., 3 May 2025, Pacaud et al., 2022).

3. Extensions, Variants, and Theoretical Properties

3.1 Conditional Cuts and Markovian Dynamics

When the stagewise independence assumption is relaxed (e.g., {ξt}\{\xi_t\} forms a Markov chain), SDDP with conditional cuts estimates the coefficients αt(ξt1)\alpha_t(\xi_{t-1}) and βt(ξt1)\beta_t(\xi_{t-1}) as functions of the previous stage's uncertainty. These coefficients are computed as conditional expectations, typically via regression over a suitable function basis (e.g., local affine regression), naturally accommodating path-dependence without inflating the physical state-space (Van-Ackooij et al., 2017). This approach retains the decomposition and tractability properties of SDDP and achieves almost-sure convergence, provided the regression error vanishes asymptotically.

3.2 Random-Horizon Problems

For processes with a random stopping time TT, SDDP is adapted by augmenting the state with a Boolean death-indicator Dt1D_{t-1}, resulting in cut approximations for both (x,Dt1)(x,D_{t-1}), with recourse costs nullified beyond the stopping event. The random-horizon SDDP requires generation of cuts at all stages up to TmaxT_{\max} but typically results in faster convergence and CPU savings, as forward passes are often truncated early (Guigues, 2018).

3.3 Complexity, Regularization, and Inexactness

Theoretical analysis shows that SDDP's iteration complexity grows polynomially with the number of stages TT (linear in TT for discounted problems), but exponentially with the per-stage state dimension ntn_t (Lan, 2019). Regularization schemes (e.g., SDDP-REG), which introduce stage- or iteration-dependent proximal terms in the forward pass, have demonstrated significant performance improvements by stabilizing trial points and accelerating convergence (Guigues et al., 2017). Inexact SDDP and related variants allow primal and dual subproblem solves to be carried out up to bounded or vanishing errors, maintaining convergence guarantees with explicit error bounds (Guigues, 2018).

3.4 Nonlinear, Mixed-Integer, and Infinite Horizon

SDDP has been generalized to convex nonlinear stochastic programs (using subgradients from convex duality), multistage stochastic mixed-integer nonlinear programs with generalized conjugacy cuts (ensuring global optimality in subclasses), and infinite-horizon decision processes via continual exploration and updating of the cutting-plane model (Guigues et al., 2019, Zhang et al., 2019, Ju et al., 2023). For strongly convex recourse, quadratic cuts yield improved convergence properties (Guigues et al., 8 Jun 2025).

3.5 Dual and Risk-Averse Formulations

Dual SDDP extends the primal framework to risk-averse programs by constructing upper approximations to the value functions via a dual Bellman–recourse recursion. This approach, particularly under coherent polyhedral risk measures, provides deterministic upper bounds converging to the true optimal value and is particularly relevant for risk-averse stochastic scheduling and planning (Costa et al., 2021).

4. Practical Implementation and Scalability Considerations

SDDP has demonstrated scalability to large-scale, high-dimensional problems, e.g., multistage container logistics, power systems planning, and microgrid control (Vassos et al., 3 May 2025, Pacaud et al., 2022). Practical implementation guidelines include:

  • Efficient representation and updating of cut bundles.
  • On-the-fly scenario generation with copula-based sampling to capture correlated uncertainties.
  • Warm-starting and dual solution caching to accelerate LP or convex program solves.
  • Block-triangular exploitation in network-structured optimization problems.
  • Adaptive stopping criteria that track true policy quality improvements via regret or upper-lower gap plateaus.

In large industrial settings, SDDP can be run with partition-based (aggregate) cut generation to reduce early-iteration computational burden, progressively refining scenario clusters only as required (Siddig et al., 2019).

5. Applications, Empirical Results, and Limitations

SDDP and its variants have been applied to multistage portfolio optimization (both with fixed and random horizons), supply chain management, stochastic optimal control for microgrids, and hydrothermal scheduling (Guigues, 2018, Vassos et al., 3 May 2025, Pacaud et al., 2022, Guigues et al., 2019).

Key empirical findings include:

  • Random-horizon SDDP policies yield significant out-of-sample gains (1–10% higher mean returns) and are substantially faster than deterministic-horizon SDDP due to early termination of forward simulations (Guigues, 2018).
  • In supply chain and logistics, SDDP scales to hundreds of variables and constraints per stage, with regret and policy-quality plateauing rapidly and only inflow uncertainty substantially impacting cost (Vassos et al., 3 May 2025).
  • For microgrid energy management, SDDP outperforms model predictive control (MPC) in both average and per-scenario cost, with negligible online policy evaluation time per stage (Pacaud et al., 2022).

Main limitations of standard SDDP include dependence on stagewise independence (unless extended as above), requirement for convex/linear recourse (unless using advanced cut types), and exponential iteration counts with state dimension. Discreteness or tractable discretization is needed for certain random-horizon and scenario-based formulations (Guigues, 2018).

6. Summary Table: SDDP Algorithmic Features and Variants

Variant Key Extension Technical Tools Primary Papers
Classical SDDP Polyhedral cuts, i.i.d. Primal-dual LP, convexity, Benders (Lan, 2019, Pacaud et al., 2022)
SDDP–Random Horizon Stopping time, state augment Death-indicator, cut splitting (Guigues, 2018)
Conditional Cuts SDDP Markovian uncertainty Functional regression, conditional exp (Van-Ackooij et al., 2017)
SDDP-REG Regularization Proximal penalty in forward pass (Guigues et al., 2017)
Inexact SDDP Approximate solves Inexact cuts, primal/dual relaxation (Guigues, 2018)
StoDCuP Nonlinear convex recourse Bundle cuts for cost/constraint (Guigues et al., 2019)
Dual SDDP Risk-averse, dual recursion Piecewise-linear upper bounding (Costa et al., 2021)
SQDP Strong convexity, quadratic Quadratic cuts for value functions (Guigues et al., 8 Jun 2025)

Each variant, while maintaining SDDP’s core forward–backward, cut-based recursion, is tailored to problem structure, uncertainty model, solution tolerances, and computational constraints.


References:

(Guigues, 2018, Van-Ackooij et al., 2017, Akian et al., 2018, Vassos et al., 3 May 2025, Guigues et al., 2019, Pacaud et al., 2022, Zhang et al., 2019, Lan, 2019, Costa et al., 2021, Dai et al., 2021, Siddig et al., 2019, Guigues et al., 2017, Guigues et al., 8 Jun 2025, Ju et al., 2023, Lebedev et al., 2020, Guigues, 2018)

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Stochastic Dual Dynamic Programming Algorithm.