DP-Rewrite: Dynamic Programming Reforms

Updated 5 March 2026

DP-Rewrite is a collection of methodologies that refactor dynamic programming recurrences to reveal new algorithmic insights and scalable implementations.
It leverages techniques such as learnable architectures, semiring algebra, dual transformations, and graph-theoretic analysis to optimize computation.
These approaches enable efficient, parallel, and transfer-optimized solutions for complex problems ranging from Markov decision processes to combinatorial optimization.

Dynamic programming (DP) rewrite, also referred to as "DP-Rewrite", encompasses a family of methodologies that transform, refactor, or reformulate DP recurrences and their algorithmic realizations. These approaches yield new algorithmic principles, scalable implementations, and insight into the algebraic or structural essence of DP optimization. DP-Rewrite concepts extend across algorithm unrolling via learnable parametric models, algebraic decompositions, dual or conjugate transformations, structured parallelization, hardware mapping, and rigorous graph-theoretic analysis.

1. Unrolling and Truncating Dynamic Programming as Learnable Architectures

Recent work interprets DP, particularly for Markov decision processes (MDPs), through the lens of parametric, end-to-end learnable models. "Unrolling Dynamic Programming via Graph Filters" introduces BellNet, which rewrites policy iteration as a learnable stack of finite-depth layers, each implementing a truncated graph filter parameterized by layer-specific or shared coefficients (Rozada et al., 29 Jul 2025).

Starting from the Bellman optimality equations for discounted MDPs,

$V^* = T[V^*], \qquad Q^* = r + \gamma P\max_{a'} Q^*,$

BellNet replaces classical fixed-point iterations with a cascade of $L$ layers: $Q^{(\ell+1)} = \sum_{j=0}^{K} h_j^{(\ell)} (P^{\pi^{(\ell)}})^j r + h_{K+1}^{(\ell)} (P^{\pi^{(\ell)}})^{K+1} Q^{(\ell)},$ where $K$ is the maximum graph filter degree, $h_j^{(\ell)}$ are learnable parameters, and $P^{\pi^{(\ell)}}$ is the block-diagonal transition matrix under the current softmax policy.

This truncation—motivated by the Cayley–Hamilton theorem—leads to a concise, unified, and transferable representation of policy and value iteration. Optimization proceeds via end-to-end minimization of cumulative Bellman error across layers: $L(\theta) = \sum_{\ell=0}^{L-1} \| r + \gamma P^{\pi^{(\ell)}} Q^{(\ell)} - \hat Q^{(\ell+1)}(\theta) \|_2^2.$ Experimental results on grid-like environments demonstrate that suitable choices of $K$ and $L$ yield accuracy comparable to heavy policy iteration at significantly reduced computation, with excellent transfer properties for shifted or mirrored environments.

2. Algebraic and Semiring-Based Rewrite of Dynamic Programming

The DP paradigm can be rigorously re-expressed via semiring polymorphism and shortcut fusion (Little et al., 2021). In this algebraic formalism, the DP recurrence is specified generically in terms of binary operations $(\oplus, \otimes)$ , with semantics deferred to the instantiation of these operators (e.g., $(\min, +)$ for cost minimization, $(+, \times)$ for counting paths).

Generic DP rewrite unfolds as: $s^* = \bigoplus_{l \in \mathcal{L}} \bigotimes_{x \in l} w(x)$ for some set $\mathcal{L}$ (e.g., paths, alignments) and scoring map $w$ . By exploiting Wadler's free theorem, brute-force generate-and-evaluate is shortcut-fused to a polynomial-time fold (Bellman recursion). Constraint integration proceeds by lifting to a composite semiring, ensuring that additional combinatorial constraints are fused into the fold without additional enumerative cost.

Worked examples (e.g., Needleman–Wunsch alignment) show that the algebraic rewrite yields efficient Bellman recurrences and enables simultaneous computation of primal and auxiliary quantities (e.g., Viterbi path) via semiring tupling.

3. Dualization and Conjugate-Based DP Rewrite in Optimal Control

For finite-horizon, input-affine control problems, DP-rewrite may exploit convexity and affine structure to convert primal minimizations over controls into additions via conjugacy (Kolarijani et al., 2020). The input-affine dynamics

$x_{t+1} = s(x_t) + i(x_t) u_t$

enable the DP operator to be transformed, via discrete conjugate duality, as follows: $V_k^d(x) = \max_{y \in Y^d} \left\{ \langle s(x), y \rangle - \left[ \ell_x^*(-i(x)y) + V_{k+1}^{d\,\#}(y) \right] \right\},$ where $V_{k+1}^{d\,\#}$ is the discrete conjugate (computed by a linear-time Legendre transform). For separable cost and dynamics, complexity collapses from $O(\# X \cdot \# U)$ to $O(\# X + \# U)$ , with nontrivial error bounds dependent on grid resolution and problem Lipschitz constants.

4. Graph-Theoretic Formalisms and Strategy Classification

DP problems can be systematically encoded as generalized d-graphs, which represent the subproblem–decomposition dependency structure as a bipartite, weighted digraph with p-vertices (subproblems) and d-vertices (decomposition choices) (Kátai, 2010). The DP-rewrite translates the recurrence: $F(s) = \min_{1 \leq k \leq K(s)} \left\{ g(s, k) + \sum_{i=1}^{m(s,k)} F(s_i(s,k)) \right\}$ into arc relaxations on the d-graph, enabling unification and classification of solution strategies:

d-TOPO: topological ordering for acyclic graphs.
d-DIJK: Dijkstra-style greedy for nonnegative weights.
d-BF: Bellman–Ford-style relaxation for arbitrary (negative) weights, provided no negative cycles exist.

This d-graph formalism yields theoretical guarantees for correctness and complexity, and also supports direct translation from recurrence to dependency graph.

5. Parallelization and Hardware Mapping of DP as Matrix Operations

DP-Rewrite crucially enables parallel and hardware-accelerated processing for large-scale combinatorial optimization. In the GPU-based split algorithm for vehicle routing and scenario-based stochastic programming, DP recurrences are collapsed into batched masked min–plus matrix–vector products over layered DAGs (Zhao et al., 22 Nov 2025). The forward DP becomes: $F^\omega = W^\omega \otimes F^\omega,$ where $W^\omega$ is the scenario-specific, masked cost matrix. GPU kernels execute over millions of scenarios in parallel, yielding nearly linear speedups and extending tractable DP computation into million-scenario regimes.

The key requirements for such mapping are acyclic (layered) state graphs, action feasibility encoded via resource masks, and absence of control-flow dependencies across transitions. These DP-rewrites transform inherently sequential routines into high-throughput GPU primitives.

6. Parallel Work-Efficient DP via Cordon Algorithm

The Cordon Algorithm provides a generalized, round-parallel framework for achieving (nearly) work-efficient parallel DP for problems admitting optimized sequential algorithms (Ding et al., 2024). The algorithm operates on a DAG induced by the DP recurrence: $D[i] = \min_{j < i} f_{i,j}(D[j])$ At each round, Cordon identifies a maximal "ready" frontier and finalizes their values in parallel, leveraging binary search and compressed data structures to avoid superfluous work. Applications include Longest Increasing Subsequence, sparse LCS, convex and concave GLWS, and optimal alphabetic tree construction, with theoretical span and work bounds that asymptotically match the optimized sequential work for key problem classes.

7. Automated Formulation and Modeling of DP with LLMs

DP-Rewrite methodology extends to automating the formulation of DP models via LLMs. DPLM, a 7B-parameter DP-specialized LLM, applies the DualReflect pipeline to generate high-quality training data by combining forward (diverse, potentially noisy) and backward (guaranteed correct, seed-limited) synthetic instance generation (Zhou et al., 15 Jul 2025). DPLM achieves state-of-the-art performance on hard instances, exceeding very large generic LLMs. The explicit modeling (and rewrite) of Bellman recurrences, code, and exact model formulations is central to this approach, highlighting the practical importance of DP-rewrite concepts in computational automation for DP modeling.

References:

(Rozada et al., 29 Jul 2025) Unrolling Dynamic Programming via Graph Filters
(Little et al., 2021) Dynamic programming by polymorphic semiring algebraic shortcut fusion
(Kolarijani et al., 2020) Fast Approximate Dynamic Programming for Input-Affine Dynamics
(Zhao et al., 22 Nov 2025) GPU-based Split algorithm for Large-Scale CVRPSD
(Kátai, 2010) Modelling dynamic programming problems by generalized d-graphs
(Ding et al., 2024) Parallel and (Nearly) Work-Efficient Dynamic Programming
(Zhou et al., 15 Jul 2025) Auto-Formulating Dynamic Programming Problems with LLMs