Multiple Shooting Method

Updated 17 February 2026

Multiple Shooting Method is a numerical technique that decomposes a problem’s domain into subintervals to enhance stability and convergence.
It independently solves local IVPs or optimal control problems on each segment and enforces continuity with equality constraints.
The method improves parallelism and robustness, making it effective for parameter estimation, neural ODE training, and control in stiff or chaotic systems.

The multiple shooting method is a family of algorithms for solving initial value problems (IVPs), boundary value problems (BVPs), parameter estimation, and optimal control problems in ordinary, partial, or stochastic differential equations. It decomposes the solution time or spatial domain into multiple subintervals, solves local problems independently, and enforces global consistency by coupling the solutions via equality constraints. This approach yields improvements in numerical stability, parallelism, and convergence behavior, particularly for stiff, highly nonlinear, chaotic, or long-horizon systems. Multiple shooting forms the mathematical foundation of numerous modern algorithms for ODE-constrained optimization, nonlinear optimal control, PDE-constrained design, neural differential equation fitting, and systems identification.

1. Mathematical Formulation and Core Principles

Consider an ODE IVP $\dot{x}(t) = f(t,x,\theta)$ , $x(t_0) = x_0$ . In the multiple shooting framework:

The interval $[t_0,T]$ is partitioned as $t_0 < t_1 < \cdots < t_m = T$ .
For each interval $[t_i, t_{i+1}]$ , introduce a separate initial value $s_i \approx x(t_i)$ .
The local IVPs $\dot x = f(t,x,\theta)$ with $x(t_i) = s_i$ are solved independently.
Continuity constraints $c_i(s_i,s_{i+1},\theta) = x(t_{i+1};s_i,\theta) - s_{i+1}=0$ couple the intervals and yield a nonlinear system for the $q = (\theta, s_0, ..., s_m)$ variables (Carbonell et al., 2015).

For parameter estimation from noisy data $y_{ij}$ ,

$J(q) = \frac12 \sum_{i,j} \| y_{ij} - g(t_{ij},x(t_{ij};s_i,\theta),\theta) \|^2, \quad \text{subject to } c_i(q) = 0.$

Similarly, in discrete-time optimal control,

$\begin{aligned} \min_{x_0,...,x_N;\ u_0,...,u_{N-1}} \ & \sum_{i=0}^{N-1} l_i(x_i, u_i) + l_N(x_N) \ \text{s.t.} \quad & x_0 = \bar x_0,\ x_{i+1} - f_i(x_i,u_i) = 0 \end{aligned}$

(Baumgärtner et al., 2023, Giftthaler et al., 2017).

This decoupling makes the problem more tractable for stiff, chaotic, or sensitive systems, as local integration error or instability does not propagate globally before being corrected by the coupling constraints.

2. Numerical Algorithms and Sensitivity Integration

Multiple shooting steps typically involve:

Initialization: Choose starting values for all interval initial states (from data, interpolation, or simulation).
Forward integration: Independently solve local IVPs on each interval to compute sub-trajectory endpoints and, if required, variational equations for sensitivities with respect to both shooting nodes and parameters (Carbonell et al., 2015, Aydogmus et al., 2020).
Assembly: Compute residual vectors for data misfit and continuity constraints, and construct associated Jacobians.
Solution of the linearized system: Formulate and solve a constrained Gauss–Newton or SQP subproblem for updates to all variables, often exploiting the banded block structure of the overall KKT matrix (Baumgärtner et al., 2023, Giftthaler et al., 2017).
Line search or globalization: Use a merit function and backtracking to enforce convergence in the presence of nonlinearity or noise (Carbonell et al., 2015, Li et al., 2023).
Update: Apply the step, update variables, and repeat until convergence.

Efficient sensitivity analysis is critical for parameter estimation and optimal control:

The local linearization approach propagates state and derivative information simultaneously, using matrix exponentials of block-augmented Jacobian matrices to ensure stable and first-order accurate derivatives with minimal computational overhead (Carbonell et al., 2015).
The adjoint method can be exploited for ODE-constrained optimization with many parameters, reducing the computational cost from $O(mp)$ to $O(m)$ per shooting interval for $p$ parameters (Aydogmus et al., 2020).
For stochastic problems, strongly convergent SDE integrators and adaptive selection of shooting nodes based on drift and diffusion contributions yield stability and accuracy (Bastani et al., 2013).

3. Theoretical Properties and Convergence

The multiple shooting method transforms a high-dimensional, globally sensitive problem into a block-structured, weakly coupled system, yielding:

Improved numerical stability, especially for stiff, highly sensitive, or chaotic ODEs, since local errors are restricted to each interval before being corrected (Carbonell et al., 2015, Jordana et al., 2021, Iakovlev et al., 2022).
Broader convergence basins and enhanced robustness with respect to poor or infeasible initial guesses, as local states can adjust independently before being reconciled (Carbonell et al., 2015, Zhuang et al., 2021).
For Gauss–Newton and generalized Gauss–Newton Hessian approximations, asymptotic local linear convergence rates are identical for multiple shooting, single shooting, and DDP (iLQR) under standard regularity conditions (Baumgärtner et al., 2023). Exact Hessians yield quadratic convergence.
The block-banded or block-tridiagonal structure of linearizations enables efficient solution (e.g., Riccati recursions in optimal control) and parallelization (Giftthaler et al., 2017, Petersson et al., 2024).

4. Applications and Domain-Specific Adaptations

Parameter Estimation in ODEs

Multiple shooting is standard for fitting ODE models to noisy, sparse, or partial time-series data (Carbonell et al., 2015, Aydogmus et al., 2020, Zhuang et al., 2021). The method is robust against local minima and can handle chaotic regimes where single shooting fails. Techniques include:

Condensing (elimination of internal variables via block recursions) to reduce the dimension of the optimization problem (Carbonell et al., 2015).
Local linearization for stable derivative propagation and efficient sensitivity calculation (Carbonell et al., 2015, Aydogmus et al., 2020).

Optimal Control (Direct Multiple Shooting)

Discrete-time direct multiple shooting transcribes optimal control problems to large but structured constrained finite-dimensional NLPs, permitting explicit or implicit ODE integration on each interval and straightforward handling of path and boundary constraints (Puchaud et al., 2023, Baumgärtner et al., 2023, Giftthaler et al., 2017). DMS is broadly used in biomechanics, robotics, and quantum control (Puchaud et al., 2023, Petersson et al., 2024). Key features:

Integration errors and constraint defects are contained locally and corrected between intervals, preventing divergence.
Hybrid MS–SS and penalty-regularized variants address tradeoffs between parallelism, defect enforcement, and step sizes (Li et al., 2023).

Neural Differential Equations

Training neural ODEs on long, irregular, or oscillatory trajectories is problematic for single shooting due to nonconvex loss surfaces, “flattened” trajectory bias, and exploding/vanishing gradients. Multiple shooting for neural ODEs splits trajectories into manageable pieces, optimizes state and parameter variables jointly, and enforces continuity by penalty or augmented Lagrangian (Turan et al., 2021, Iakovlev et al., 2022, Jordana et al., 2021, Prabhu et al., 31 May 2025). Recent work includes:

Variational inference for neural ODEs with amortized encoding of block variables by transformers (Iakovlev et al., 2022).
Condensing methods utilizing Schur complements and matrix-free conjugate gradients, enabling GPU and parallel implementation with first-order optimizers (Prabhu et al., 31 May 2025).
Differentiable multiple shooting layers for time-parallel implicit models, yielding reduced wall-clock time and function evaluations (Massaroli et al., 2021).

Geometric Mechanics and PDEs

Multiple shooting is the preferred method for high-dimensional, stiff, or multiphysics BVPs such as:

3D Cosserat rod problems with large rotations and geometric discontinuities, using block-matrix parameterizations (e.g., MRP/SMRP) for numerical robustness (Florian et al., 2024).
Boundary value problems in optimal mass transport (Wasserstein geodesics), where structure-preserving time-parallel multiple shooting with continuation enables robust solution and conservation of physical invariants (Cui et al., 2021).

Specialized Domains

Adaptive node selection in stochastic SDE-BVPs, based on local error monitoring of drift vs. diffusion, yields accuracy and efficiency unattainable with uniform discretizations (Bastani et al., 2013).
Fast algorithms for periodic orbits in symplectic mappings embed Floquet vector equations into the multiple shooting system, decoupling the corrections into low-dimensional recurrences and eliminating large linear solves (Kumar, 1 Jan 2026).

5. Practical Implementation and Parallelization

Multiple shooting is highly amenable to parallel implementation, especially in large-scale, high-dimensional, or stiff systems:

Each subinterval's IVP or local integration can be executed independently, with boundary variables exchanged at synchronization points (Petersson et al., 2024, Massaroli et al., 2021).
For time-parallel quantum control, 80× speedup in gradient evaluation is reported for 4-qubit QFT gate optimization, due to parallelization of both forward propagation and adjoint backpropagation with MPI (Petersson et al., 2024).
In direct multiple shooting for optimal control and neural ODE training, block sensitivity computations and gradient assembly can be performed in parallel, yielding significant improvements in wall-clock time and scalability (Prabhu et al., 31 May 2025, Massaroli et al., 2021).

Penalty methods and augmented Lagrangians for relaxing (softening) hard continuity enforcement are commonly used to improve numerical conditioning and convergence for large systems, with adaptive penalty weights and merit functions to balance data fit and defect reduction (Li et al., 2023, Turan et al., 2021).

6. Empirical Performance and Comparative Studies

Multiple shooting consistently demonstrates superior performance over single shooting and naively integrated direct methods in challenging regimes:

Robust recovery of true parameters in chaotic system identification (Henon–Heiles, Rikitake, FitzHugh–Nagumo) with bias below $10^{-3}$ and high convergence rate compared to frequent failure of single shooting (Carbonell et al., 2015).
For nonlinear optimal control, MS variants achieve faster local contraction rates and shorter runtimes than classic iLQR, and sustain real-time MPC rates at larger problem scales (Giftthaler et al., 2017, Li et al., 2023).
In biomechanics, direct multiple shooting is as accurate as direct collocation for most OCPs, with explicit RK4 being faster but less dynamically consistent; the methods are recommended to be used complementarily depending on dynamic consistency requirements and available prior knowledge (Puchaud et al., 2023).
For time-parallel applications and high-dimensional quantum systems, multiple shooting achieves orders-of-magnitude speedups in per-iteration runtime with no loss in final accuracy, enabling solution of previously intractable problems (Petersson et al., 2024).
Adaptive multiple shooting in stochastic boundary value problems significantly outperforms fixed-step and collocation methods in both mean-square and strong error, especially in drift- or diffusion-dominated settings (Bastani et al., 2013).

7. Extensions, Limitations, and Future Directions

Recent advances and areas of active research include:

Probabilistic multiple shooting via variational and Bayesian formulations, allowing for soft or stochastic continuity enforcement, with amortized inference (Iakovlev et al., 2022).
Integration with modern machine learning hardware—GPU, auto-diff, and deep implicit layers—to scale to massive neural ODEs and PDE-constrained learning (Prabhu et al., 31 May 2025, Massaroli et al., 2021).
Adaptive mesh refinement and error-based node placement for optimal computational efficiency in time/space-adaptive problems (Bastani et al., 2013).
Hamiltonian-preserving time-parallel multiple shooting for conservation-law-driven PDEs (Cui et al., 2021).

Limitations mainly relate to the increased number of variables and nonlinear constraints compared to single shooting, necessitating advanced solver architectures and possibly more iterations, though mitigated by effective condensing, parallelization, and robust algorithms. Choice of the number of shooting intervals is a tradeoff: too few and numerical instability can reappear; too many and the NLP grows in size and may require sophisticated interior-point or block-SQP solvers (Zhuang et al., 2021, Puchaud et al., 2023).

The multiple shooting method remains a cornerstone for robust, parallel, and scalable numerical solution of complex dynamical systems across scientific computing, control, optimization, and emerging machine learning domains.