Stochastic Dual Dynamic Programming
- SDDP is a decomposition algorithm that approximates cost-to-go functions in multistage stochastic optimization by representing them as a supremum of affine cuts.
- It employs both single-cut and multicut strategies to efficiently handle dynamic programming recursion and balance approximation accuracy with computational tractability.
- Advanced cut management techniques such as Level 1 and LML 1 selectors optimize performance by limiting the number of active cuts while ensuring convergence.
Stochastic Dual Dynamic Programming (SDDP) is a fundamental decomposition-based algorithm for the numerical solution of multistage stochastic linear and convex optimization problems with recourse. SDDP has achieved prominence in operations research, stochastic control, and energy system planning, owing to its ability to efficiently approximate cost-to-go (value) functions as polyhedral lower envelopes (suprema of affine cuts), thus mitigating the curse of dimensionality inherent in classical scenario tree and dynamic programming approaches.
1. Problem Class and Dynamic Programming Recursion
SDDP targets the T-stage risk-neutral multistage stochastic linear program, classically written in dynamic programming form as follows. At each stage , a decision is made after observing the current realization of uncertainties , which are assumed to have finite discrete support and stagewise independence. The value functions satisfy nested Bellman recursions: where
Assumptions for convergence include nonempty, bounded recourse sets for all feasible histories; independent, finite-support random data beyond stage 1; and the ability to solve each stage linear program to an extreme point solution (Guigues et al., 2019).
2. Algorithm Structure: Single-Cut and Multicut SDDP
SDDP algorithms decompose the dynamic program by recursively approximating each cost-to-go function from below using a supremum of affine functions ("cuts") derived from dual solutions of backward pass subproblems.
- Single-Cut SDDP maintains, for each stage , a polyhedral under-approximation
generated at trial points from the th forward pass.
- Multicut SDDP (MuDA), instead of collapsing the expectation in , constructs for each scenario at stage a local recourse approximation and then averages:
In both frameworks, each new cut is affine and valid globally.
The forward pass samples a trajectory through the scenario tree, solving a deterministic chain of LPs using the current cut approximations. The backward pass, for each trajectory state and scenario, solves the dual LP to generate a new cut at the visited point (Guigues et al., 2019).
3. Cut Management: Selection Strategies
The number of cuts grows linearly with the number of iterations, so cut selection (pruning) becomes crucial for computational efficiency. (Guigues et al., 2019) introduces a selector framework:
- Level 1: Keep every cut that has ever been active (maximal) at any trial point. Guarantees the same lower-bound evolution as unpruned SDDP, but may accumulate many cuts.
- Limited Memory Level 1 (LML 1): For each trial point, retain only the oldest cut among all those tied for maximality. This is far more aggressive, drastically reducing the number of active cuts.
At each backward pass, active cuts are recomputed for all historical trial points and then updated according to the selection rule. This approach applies both to single-cut SDDP and its multicut analogs.
4. Convergence Analysis
Under the standing assumptions (stagewise independence, finite support, nonempty/bounded recourse, independent path sampling, and exact LP solutions), SDDP with any selector satisfying a monotonicity property (e.g., Level 1 or LML 1) exhibits almost-sure finite convergence:
- There exists such that for all , both the pool of locally active cuts