OTA Merging: Trajectory-Informed Optimization
- OTA Merging is a framework that utilizes optimization trajectory statistics such as curvature estimates and observability metrics to merge models and trajectories.
- It employs optimizer-derived measures, like Adam’s second-moment statistics, to guide curvature-aware merges that reduce negative transfer and align updates with the loss landscape.
- Applications span robotics, control, and large-scale machine learning, yielding faster convergence and improved estimation accuracy in multi-modal and hybrid systems.
Optimization Trajectory Aware (OTA) Merging is a class of methods and frameworks that leverage information from optimization trajectories—such as parameter updates, curvature estimates, or trajectory observability metrics—to guide the merging or synthesis of multiple models, trajectories, or objectives. OTA merging is widely applicable across robotics, control, and large-scale machine learning. The unifying principle is that auxiliary statistics or structural knowledge collected during optimization (such as second-moment estimates or observability Gramians) are explicitly used to effect merging that is more robust to interference, respects critical system properties, and enhances final task performance.
1. Curvature-Aware Merging via Optimization Trajectory Statistics
OTA Merging in the model merging context refers to methods that use optimizer-derived information—particularly, the second-moment statistics tracked by adaptive optimizers such as Adam—to inform the aggregation of independently fine-tuned moduless. In the framework of curvature-aware merging (Mahdavinia et al., 14 Sep 2025), each expert model provides a parameter update (“task vector”) and a diagonally-approximated curvature estimate from its optimization trajectory.
Given a set of experts with parameters and base model , the update per task is: With Adam, the per-parameter second-moment acts as a diagonal Fisher or Hessian approximation. The saliency score (Fast Fisher Grafting, FFG) for each parameter is: which is used to sparsify the update—retaining only high-saliency parameters.
OTA merging then aggregates these (sparsified) updates, reweighting each using its curvature map: where and is the binary mask selecting important updates.
This approach directly utilizes the optimizer’s trajectory information (curvature proxies), mitigating negative transfer when merging and aligning model updates with the local loss landscape (Mahdavinia et al., 14 Sep 2025). Empirical results show improvements over standard averaging, especially when merging models specialized for diverse capabilities.
2. Observability-Aware Trajectory Optimization and OTA Merging
In robotics and control, OTA Merging leverages observability metrics derived from system and measurement models to generate trajectories that are optimally informative for estimation and calibration tasks. The core methods compute trajectory-dependent nonlinear observability Gramians using higher-order Lie derivatives of the sensor model (Hausman et al., 2016, Grebe et al., 2021).
Given a system with state , control , and measurement function , higher-order Lie derivatives are computed and locally Taylor-expanded along the trajectory. The sensitivity matrix
is integrated to form the local observability Gramian: where is the sensitivity Jacobian. The OTA trajectory cost is
which, when minimized, encourages trajectories that maximize state excitation and estimation convergence (Hausman et al., 2016).
This principle is leveraged not only for self-calibration of onboard sensor parameters (e.g., IMU-GPS extrinsics in UAVs) but also extends to trajectory planning for hybrid systems, active sensor calibration (Wang et al., 16 Jun 2025), and multi-objective merging (where energy, estimation quality, and constraint satisfaction are simulatenously optimized).
3. OTA Merging in Multi-Modal and Hybrid Control
OTA merging generalizes to hybrid systems and multi-modal settings where trajectory optimization must handle switching between dynamics (e.g., free motion vs. contact), multi-objective criteria (e.g., energy, smoothness, impact mitigation), or multiple candidate trajectories.
Impact-aware multi-mode trajectory optimization leverages hybrid control to dynamically select mode and control policies over a trajectory, with the timing and nature of each switch (e.g., between compliance control and active pushing) guided by physical models embedded in the optimization (Stouraitis et al., 2020). The merging of expert control policies is thus “trajectory aware” in that the optimization integrates knowledge of both the dynamical transitions and the relevant physical/estimation metrics.
In autonomous driving, OTA merging frameworks generate multiple candidate trajectories in parallel (using multiple shooting or sampling) and evaluate them using composite cost functions factoring in safety, efficiency, comfort, and consistency (Zheng et al., 2023, Jiang et al., 2021). The final merging step thus explicitly reasons about the full space of possible optimization trajectories, selecting those that best meet both physical and task-driven constraints.
4. Constraint- and Estimation-Aware Merging via Optimization Trajectories
OTA merging methods increasingly incorporate constraint satisfaction and estimation-aware metrics directly into the trajectory optimization process.
Constraint-aware diffusion models introduce a hybrid loss composed of both standard diffusion model loss and an explicit constraint-violation penalty (Li et al., 3 Jun 2024): where measures the total violation of trajectory constraints, and provides a noise-level reference. This mechanism ensures that the denoising process yields optimization trajectories that, when merged or selected, are inherently more feasible.
Estimation-aware OTA merging for systems with set-valued, state-dependent measurement uncertainties optimizes trajectories by maximizing a concave lower bound on a set-valued observability metric (Deole et al., 15 Jan 2025): ensuring that the merged trajectory is tailored for improved estimation robustness that explicitly accounts for non-Gaussian, non-parametric sensor uncertainties.
5. OTA Merging in Large-Scale and Parallel Trajectory Optimization
As trajectory optimization problems scale to long horizons or large model/trajectory collections, OTA merging approaches have increasingly leveraged parallel and consensus-based algorithms.
The TOP framework decomposes trajectory optimization into segments solved in parallel using the Consensus Alternating Direction Method of Multipliers (CADMM) (Yu et al., 14 Jul 2025). Here, each subtrajectory solves a local optimization problem—possibly with closed-form updates for linear and quadratic constraints—and the consensus variables enforce continuity. The merging of solutions across the global trajectory is informed by the optimization dynamics and configuration of all segments, resulting in time complexity per iteration with respect to the number of segments.
Parallel OTA merging enables real-time high-fidelity trajectory synthesis and merging in large-scale settings, such as multi-agent motion planning and multi-modal model merging, with direct applicability to real-time robotics, autonomous vehicles, and complex networked systems.
6. Analytical Properties and Performance Gains
OTA merging approaches are analytically grounded in information geometry, nonlinear observability theory, and optimization theory. In model merging (Mahdavinia et al., 14 Sep 2025), analyses reveal that the second-moment statistics (curvature proxies) of independently fine-tuned models exhibit substantial overlap, providing a theoretical explanation of why simple linear merging can be effective in practice. Explicit curvature-aware aggregation further reduces negative transfer by aligning updates with the principal directions of the loss surface.
In control and perception settings (Hausman et al., 2016, Grebe et al., 2021), the use of observability metrics (Gramian singular values, Lie derivatives) in optimization ensures that merged trajectories not only achieve task objectives but also provide maximal information for downstream estimation. Empirically, observability-aware trajectories for UAV self-calibration achieve 80x speed improvements over covariance-based methods and up to 4x improvements in RMSE for critical extrinsic parameters.
A summary of select approaches and their principled use of optimization trajectory information is provided below:
Domain | OTA Statistic Used | OTA Merging Objective |
---|---|---|
LLM merge | Adam second-moment (Fisher diag.) | Curvature-aware aggregation to mitigate interference |
Robotic perception | Nonlinear observability Gramian | Trajectory design for maximal estimation information |
Diffusion models | Denoising trajectory covariances | Operator merging to preserve signal during distillation |
Multi-agent control | Parallel optimization consensus | Merging subtrajectories with global smoothness/feasibility |
7. Implications and Future Directions
OTA merging represents a shift from static or naive averaging/fusion strategies to methods that directly harness the rich trajectory statistics available from the optimization process—be it for model weights, trajectory segments, or dynamic system states. Current research directions include:
- Extending curvature- and information-aware merging to more general, non-diagonal settings and richer model classes (Nguyen et al., 26 Feb 2025, Mahdavinia et al., 14 Sep 2025)
- Dynamic constraint and estimation-aware planning in real-time, multi-modal sensor fusion (Wang et al., 16 Jun 2025, Deole et al., 15 Jan 2025)
- Integrating OTA merging with hybrid control, multi-agent interaction, and robust safety constraints in adversarial environments (Zheng et al., 2023, Li et al., 3 Jun 2024)
- Developing universal OTA merging frameworks capable of unifying model, trajectory, and control merges while scaling with modern GPU and parallel architectures (Yu et al., 14 Jul 2025)
In all domains, OTA merging increases robustness, reduces negative transfer, and supports rapid convergence or estimation—all by aligning merge operations with the intrinsic optimization geometry and the knowledge encoded in the optimization trajectory itself.