Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dual Model Predictive Controller (DMPC)

Updated 12 November 2025
  • Dual Model Predictive Controller (DMPC) is a control framework that balances exploration for parameter learning with exploitation for regulation in uncertain dynamic systems.
  • It employs scenario-tree rollouts with Bayesian updates to optimize control actions over distinct sub-horizons, effectively addressing both probing and tracking objectives.
  • DMPC has demonstrated superior performance over conventional certainty-equivalent MPC by achieving faster parameter convergence and robust constraint satisfaction in nonlinear benchmarks.

A Dual Model Predictive Controller (DMPC) is a model predictive control scheme that explicitly incorporates the dual effect arising from the interplay between regulation/tracking performance and active system identification. This framework addresses optimal control problems for partially unknown systems, where it is necessary to balance system input excitation (exploration) for improved parameter learning with the classical control objective of regulation (exploitation). In contrast to conventional certainty-equivalent (CE) MPC, which neglects future informational gains from probing, DMPC methods embed this duality by integrating parameter or model uncertainty, learning dynamics, and value-of-information into the MPC optimization (Arcari et al., 2019, Arcari et al., 2019, Baltussen et al., 11 Nov 2025).

1. Theoretical Motivation: Dual Control and its Intractability

The dual control paradigm, formulated by Feldbaum and formalized via Bellman’s stochastic dynamic programming (DP), seeks control inputs that minimize an expected cumulative cost, subject to both state and parameter uncertainty. For a general nonlinear, discrete-time system parameterized by an unknown θ\theta and process noise wkw_k, the information state IkI_k — i.e., the sequence of states and controls up to time kk — evolves as: xk+1=Φ(xk,uk)θ+wk,x_{k+1} = \Phi(x_k, u_k) \theta + w_k, with θP[θ]\theta \sim P[\theta]. The cost-to-go under dual control is expressed recursively: Jk(Ik)=minπkEθ,wk[lk(xk,πk(xk))+Jk+1(Ik+1)Ik],J_k^*(I_k) = \min_{\pi_k} \mathbb{E}_{\theta, w_k}\Big[ l_k(x_k, \pi_k(x_k)) + J_{k+1}^*(I_{k+1})\mid I_k\Big], which naturally nests minimization (over controls) and expectation (over uncertain parameters and noise) steps. The dual effect manifests as a trade-off between control for regulation and control for probing, as the controller's actions affect both the current performance and the future informativeness through Bayesian posterior updates. For all but trivial state/parameter spaces and short horizons, this “min–E–min–E…” structure is computationally intractable (Arcari et al., 2019).

2. Scenario-Tree Rollout and Problem Decomposition

To render the dual control approach feasible within an MPC framework, modern DMPC implementations split the finite prediction horizon into two segments:

  • Dual (Exploration) Sub-horizon: Over LL short steps (LNL \ll N), the controller simulates hypothetical future observations under current uncertainty, updating parameter beliefs on a scenario tree constructed via forward sampling of parameter/noise realizations and recursive Bayesian inference.
  • Exploitation (Regulation) Sub-horizon: Beyond step LL, the controller fixes the posterior estimates and propagates the system using an open-loop or certainty-equivalent sequence, optimizing purely for regulation (Arcari et al., 2019, Arcari et al., 2019).

At each decision node of the scenario tree, system trajectories are branched according to sampled parameter realizations and noise. Posterior means and covariances for system parameters/uncertainties are updated by the (linear-Gaussian) Bayesian update: Σθk+11=Σθk1+Φ(xk,uk)Σw1Φ(xk,uk),μθk+1=Σθk+1(Σθk1μθk+Φ(xk,uk)Σw1xk+1),\Sigma_{\theta_{k+1}}^{-1} = \Sigma_{\theta_k}^{-1} + \Phi(x_k,u_k)^\top \Sigma_w^{-1} \Phi(x_k,u_k), \quad \mu_{\theta_{k+1}} = \Sigma_{\theta_{k+1}} (\Sigma_{\theta_k}^{-1} \mu_{\theta_k} + \Phi(x_k,u_k)^\top \Sigma_w^{-1} x_{k+1}), or, for structural uncertainty, both categorical mode probabilities and continuous parameter posteriors are propagated (Arcari et al., 2019).

The scenario tree enables parallel evaluation of the cost and state evolution along all branches. Control variables uk={ukjk}\mathbf{u}_k = \{u_k^{j_k}\} at each tree node are jointly optimized over all tree branches up to depth LL, and for each branch, a trailing exploitation sequence is optimized under the fixed parameter/model belief.

3. Dual MPC Optimization Formulation

The DMPC scheme is realized as a single structured nonlinear program that captures both exploration and exploitation: min{u0,...,uL1},{uL:N1jL}k=0L11Nskjk=1Nsklk(xkjk,ukjk)+1NsLjL=1NsLJL(ILjL,uL:N1jL),\min_{\{\mathbf{u}_0, ..., \mathbf{u}_{L-1}\}, \{u_{L:N-1}^{j_L}\}} \sum_{k=0}^{L-1} \frac{1}{N_s^k} \sum_{j_k=1}^{N_s^k} l_k(x_k^{j_k}, u_k^{j_k}) + \frac{1}{N_s^L} \sum_{j_L=1}^{N_s^L} J_L(I_L^{j_L}, u_{L:N-1}^{j_L}), subject to the tree-consistent dynamics, Bayesian updates, and input/state constraints on all branches (Arcari et al., 2019).

For systems with both parametric and structural uncertainties (e.g., multiple discrete modes), branch probabilities pˉjk+1\bar{p}_{j_{k+1}} are recursively computed by weighting each scenario with the mode's posterior probability, and the exploitation tail applies certainty-equivalent or first-moment approximations (Arcari et al., 2019).

Optimization is typically performed using NLP solvers (e.g., IPOPT), exploiting the problem’s block-sparse structure stemming from the scenario tree construction.

4. Implementation, Complexity, and Scalability

The key to tractability is careful selection of the exploration horizon length LL (typically L=1,2,3L=1,2,3) and the scenario branching factor NsN_s (tens of branches can be sufficient in practice). The computational complexity of the dual part scales as O(NsL)O(N_s^L). For the exploitation part, costs are quadratic or linear and can be solved efficiently via Riccati recursions or direct methods. Because dual MPC merges all exploration and exploitation decisions into a single optimization, it provides better closed-loop performance and faster parameter learning than CE-MPC or passive adaptive schemes, even for relatively short horizons or low scenario counts (Arcari et al., 2019, Arcari et al., 2019, Baltussen et al., 11 Nov 2025).

Pseudocode at each receding horizon step typically proceeds as follows:

1
2
3
4
5
6
7
For each time t:
    1. Measure current state; update parameter/model beliefs.
    2. Build scenario tree of depth L from current information.
    3. Propagate states and update beliefs at every tree node.
    4. Jointly optimize control decisions over exploration and exploitation phases.
    5. Apply the first-stage control from the root node.
    6. Observe new state; update information; repeat.
(Arcari et al., 2019)

5. Robustness, Safe Learning, and Constraints

Robust constraint satisfaction is typically integrated by combining DMPC with tube-based robust MPC constructs, forming contingency or “robust” horizons parallel to the exploration horizon (Baltussen et al., 11 Nov 2025). The tube construction ensures that, even as the system actively excites for information, fallback plans and constraint-tightenings guarantee that all state and input constraints are satisfied for the true (unknown) system within prescribed uncertainty sets.

Active learning DMPC approaches using Gaussian processes incorporate a robust contingency plan, learning objective (e.g., maximizing GP posterior variance along the horizon), and deterioration budget constraints, so that the controller only explores as much as permitted by a pre-specified bound on the loss in regulation performance (Baltussen et al., 11 Nov 2025).

6. Application Areas and Benchmarks

Dual MPC has been successfully demonstrated on classical benchmarks such as

  • Scalar LTI systems with sign-ambiguous or poorly known parameters
  • Nonlinear control tasks (e.g., mountain-car swing-up, altitude-control under actuator faults)
  • Nonlinear systems with nonparametric uncertainties modeled via GPs (e.g., mass-spring-damper with unknown nonlinearities)

In these studies, dual MPC has been shown to

  • Actively and optimally excite the system to resolve ambiguities faster than CE-MPC, as evidenced by faster goal achievement and identification in the mountain-car and actuator-fault scenarios (Arcari et al., 2019, Arcari et al., 2019)
  • Avoid getting trapped in poor local minima, as can happen with aS-MPC or naive learning schemes
  • Maintain robust constraint satisfaction throughout aggressive exploration (Baltussen et al., 11 Nov 2025)

Performance metrics reported include first-trial success rates, closed-loop costs, trajectory tracking errors, rate of parameter learning, and rates of constraint violations (which are typically zero for tube-based robust dual MPC).

7. Comparative Analysis and Concluding Remarks

Dual MPC methods are strictly superior to certainty-equivalent and sample-average adaptive MPC in terms of learning efficiency and robustness in the face of model uncertainty. Practical implementations trade-off exploration horizon depth and computational resources, but near-optimal policies and rapid learning are attainable with short LL and moderate NsN_s. When implemented with robust (tube-based) schemes and active learning constraints, DMPC delivers a principled balance of exploration, exploitation, and safety—enabling application to nonlinear, multi-modal, and nonparametric systems with real-time constraints (Arcari et al., 2019, Arcari et al., 2019, Baltussen et al., 11 Nov 2025).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dual Model Predictive Controller (DMPC).