Papers
Topics
Authors
Recent
AI Research Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 77 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 178 tok/s Pro
GPT OSS 120B 385 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

OTA Merging: Trajectory-Informed Optimization

Updated 21 September 2025
  • OTA Merging is a framework that utilizes optimization trajectory statistics such as curvature estimates and observability metrics to merge models and trajectories.
  • It employs optimizer-derived measures, like Adam’s second-moment statistics, to guide curvature-aware merges that reduce negative transfer and align updates with the loss landscape.
  • Applications span robotics, control, and large-scale machine learning, yielding faster convergence and improved estimation accuracy in multi-modal and hybrid systems.

Optimization Trajectory Aware (OTA) Merging is a class of methods and frameworks that leverage information from optimization trajectories—such as parameter updates, curvature estimates, or trajectory observability metrics—to guide the merging or synthesis of multiple models, trajectories, or objectives. OTA merging is widely applicable across robotics, control, and large-scale machine learning. The unifying principle is that auxiliary statistics or structural knowledge collected during optimization (such as second-moment estimates or observability Gramians) are explicitly used to effect merging that is more robust to interference, respects critical system properties, and enhances final task performance.

1. Curvature-Aware Merging via Optimization Trajectory Statistics

OTA Merging in the model merging context refers to methods that use optimizer-derived information—particularly, the second-moment statistics tracked by adaptive optimizers such as Adam—to inform the aggregation of independently fine-tuned moduless. In the framework of curvature-aware merging (Mahdavinia et al., 14 Sep 2025), each expert model provides a parameter update (“task vector”) and a diagonally-approximated curvature estimate from its optimization trajectory.

Given a set of experts with parameters wτw^*_\tau and base model w0w_0, the update per task is: Δwτ=wτw0.\Delta w_\tau = w^*_\tau - w_0. With Adam, the per-parameter second-moment vτv_\tau acts as a diagonal Fisher or Hessian approximation. The saliency score (Fast Fisher Grafting, FFG) for each parameter is: sτ,i=(Δwτ,i)2vτ,is_{\tau,i} = (\Delta w_{\tau,i})^2 \cdot v_{\tau,i} which is used to sparsify the update—retaining only high-saliency parameters.

OTA merging then aggregates these (sparsified) updates, reweighting each using its curvature map: wmerged=w0+(τ=1TPτ)1(τ=1TPτ(mτΔwτ))\mathbf{w}_\text{merged} = \mathbf{w}_0 + \left(\sum_{\tau=1}^T P^*_{\tau}\right)^{-1} \left( \sum_{\tau=1}^T P^*_{\tau} (m_{\tau} \circ \Delta w_\tau) \right) where Pτ=Diag(vτ+ε)P^*_{\tau} = \operatorname{Diag}\left(\sqrt{v_{\tau}^* + \varepsilon}\right) and mτm_{\tau} is the binary mask selecting important updates.

This approach directly utilizes the optimizer’s trajectory information (curvature proxies), mitigating negative transfer when merging and aligning model updates with the local loss landscape (Mahdavinia et al., 14 Sep 2025). Empirical results show improvements over standard averaging, especially when merging models specialized for diverse capabilities.

2. Observability-Aware Trajectory Optimization and OTA Merging

In robotics and control, OTA Merging leverages observability metrics derived from system and measurement models to generate trajectories that are optimally informative for estimation and calibration tasks. The core methods compute trajectory-dependent nonlinear observability Gramians using higher-order Lie derivatives of the sensor model (Hausman et al., 2016, Grebe et al., 2021).

Given a system with state xx, control uu, and measurement function h(x,u)h(x,u), higher-order Lie derivatives hi(x,u)h^i(x,u) are computed and locally Taylor-expanded along the trajectory. The sensitivity matrix

ht0(t)x=[I,δtI,(δt2/2)I,]O(x(t),u(t))\frac{\partial h_{t_0}(t)}{\partial x} = [I, \delta t \, I, (\delta t^2/2)I, \ldots] \cdot \mathcal{O}(x(t), u(t))

is integrated to form the local observability Gramian: Wo(0,T,Δt)0TKt(t+Δt)Kt(t+Δt)dtW_o(0,T,\Delta t) \approx \int_0^T K_t(t+\Delta t)^\top K_t(t+\Delta t) \, dt where KtK_t is the sensitivity Jacobian. The OTA trajectory cost is

Jobs=σmin(Wo)\mathcal{J}_{obs} = - \sigma_{\min}(W_o)

which, when minimized, encourages trajectories that maximize state excitation and estimation convergence (Hausman et al., 2016).

This principle is leveraged not only for self-calibration of onboard sensor parameters (e.g., IMU-GPS extrinsics in UAVs) but also extends to trajectory planning for hybrid systems, active sensor calibration (Wang et al., 16 Jun 2025), and multi-objective merging (where energy, estimation quality, and constraint satisfaction are simulatenously optimized).

3. OTA Merging in Multi-Modal and Hybrid Control

OTA merging generalizes to hybrid systems and multi-modal settings where trajectory optimization must handle switching between dynamics (e.g., free motion vs. contact), multi-objective criteria (e.g., energy, smoothness, impact mitigation), or multiple candidate trajectories.

Impact-aware multi-mode trajectory optimization leverages hybrid control to dynamically select mode and control policies over a trajectory, with the timing and nature of each switch (e.g., between compliance control and active pushing) guided by physical models embedded in the optimization (Stouraitis et al., 2020). The merging of expert control policies is thus “trajectory aware” in that the optimization integrates knowledge of both the dynamical transitions and the relevant physical/estimation metrics.

In autonomous driving, OTA merging frameworks generate multiple candidate trajectories in parallel (using multiple shooting or sampling) and evaluate them using composite cost functions factoring in safety, efficiency, comfort, and consistency (Zheng et al., 2023, Jiang et al., 2021). The final merging step thus explicitly reasons about the full space of possible optimization trajectories, selecting those that best meet both physical and task-driven constraints.

4. Constraint- and Estimation-Aware Merging via Optimization Trajectories

OTA merging methods increasingly incorporate constraint satisfaction and estimation-aware metrics directly into the trajectory optimization process.

Constraint-aware diffusion models introduce a hybrid loss composed of both standard diffusion model loss and an explicit constraint-violation penalty (Li et al., 3 Jun 2024): Lconstrained diff=Ldiff+λ(Lvio/μvioGT)\mathcal{L}_\text{constrained diff} = \mathcal{L}_{\text{diff}} + \lambda \cdot \left( \mathcal{L}_{\text{vio}} / \mu_{\text{vio}}^{GT} \right) where Lvio\mathcal{L}_{\text{vio}} measures the total violation of trajectory constraints, and μvioGT\mu_{\text{vio}}^{GT} provides a noise-level reference. This mechanism ensures that the denoising process yields optimization trajectories that, when merged or selected, are inherently more feasible.

Estimation-aware OTA merging for systems with set-valued, state-dependent measurement uncertainties optimizes trajectories by maximizing a concave lower bound on a set-valued observability metric (Deole et al., 15 Jan 2025): Do(Y0:T)=t=0T{σmin(CAt)εσmax(At)εL(xˉt)2Λ(Yxˉt)}D_o^\ell(Y_{0:T}) = \sum_{t=0}^T \left\{ \sigma_{\min}(C A^t) \varepsilon - \sigma_{\max}(A^t) \varepsilon L(\bar{x}_t) - 2\Lambda(\mathcal{Y}_{\bar{x}_t}) \right\} ensuring that the merged trajectory is tailored for improved estimation robustness that explicitly accounts for non-Gaussian, non-parametric sensor uncertainties.

5. OTA Merging in Large-Scale and Parallel Trajectory Optimization

As trajectory optimization problems scale to long horizons or large model/trajectory collections, OTA merging approaches have increasingly leveraged parallel and consensus-based algorithms.

The TOP framework decomposes trajectory optimization into segments solved in parallel using the Consensus Alternating Direction Method of Multipliers (CADMM) (Yu et al., 14 Jul 2025). Here, each subtrajectory solves a local optimization problem—possibly with closed-form updates for linear and quadratic constraints—and the consensus variables enforce continuity. The merging of solutions across the global trajectory is informed by the optimization dynamics and configuration of all segments, resulting in time complexity per iteration O(1)O(1) with respect to the number of segments.

Parallel OTA merging enables real-time high-fidelity trajectory synthesis and merging in large-scale settings, such as multi-agent motion planning and multi-modal model merging, with direct applicability to real-time robotics, autonomous vehicles, and complex networked systems.

6. Analytical Properties and Performance Gains

OTA merging approaches are analytically grounded in information geometry, nonlinear observability theory, and optimization theory. In model merging (Mahdavinia et al., 14 Sep 2025), analyses reveal that the second-moment statistics (curvature proxies) of independently fine-tuned models exhibit substantial overlap, providing a theoretical explanation of why simple linear merging can be effective in practice. Explicit curvature-aware aggregation further reduces negative transfer by aligning updates with the principal directions of the loss surface.

In control and perception settings (Hausman et al., 2016, Grebe et al., 2021), the use of observability metrics (Gramian singular values, Lie derivatives) in optimization ensures that merged trajectories not only achieve task objectives but also provide maximal information for downstream estimation. Empirically, observability-aware trajectories for UAV self-calibration achieve 80x speed improvements over covariance-based methods and up to 4x improvements in RMSE for critical extrinsic parameters.

A summary of select approaches and their principled use of optimization trajectory information is provided below:

Domain OTA Statistic Used OTA Merging Objective
LLM merge Adam second-moment (Fisher diag.) Curvature-aware aggregation to mitigate interference
Robotic perception Nonlinear observability Gramian Trajectory design for maximal estimation information
Diffusion models Denoising trajectory covariances Operator merging to preserve signal during distillation
Multi-agent control Parallel optimization consensus Merging subtrajectories with global smoothness/feasibility

7. Implications and Future Directions

OTA merging represents a shift from static or naive averaging/fusion strategies to methods that directly harness the rich trajectory statistics available from the optimization process—be it for model weights, trajectory segments, or dynamic system states. Current research directions include:

In all domains, OTA merging increases robustness, reduces negative transfer, and supports rapid convergence or estimation—all by aligning merge operations with the intrinsic optimization geometry and the knowledge encoded in the optimization trajectory itself.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Optimization Trajectory Aware (OTA) Merging.