Omnidirectional Differentiable Bundle Adjustment
- ODBA is a spherical optimization module that refines camera poses and sparse inverse depths for 360° visual odometry using a differentiable Gauss–Newton approach.
- It leverages a distortion-aware feature extractor and spherical reprojection models, reducing absolute trajectory error by up to 56% compared to traditional methods.
- The framework supports end-to-end training by integrating geometric supervision with feature learning, ensuring robust performance under aggressive motion and distortions.
Omnidirectional Differentiable Bundle Adjustment (ODBA) is a geometric optimization module designed for accurate pose and depth refinement in monocular omnidirectional visual odometry (OVO) systems leveraging 360-degree cameras. Unlike classical photometric or feature-based approaches, ODBA integrates a distortion-aware spherical feature interface and employs spherical trigonometric camera models, enabling joint optimization of camera poses and sparse landmark inverse depths via a differentiable Gauss–Newton solver with hand-derived Jacobians. This formulation supports end-to-end training, geometric supervision, and robust pose estimation under challenging conditions such as aggressive motion and severe image distortion (Guo et al., 5 Jan 2026).
1. Mathematical Foundations
ODBA models the geometry of a 360-degree equirectangular image as a mapping between image pixels and spherical coordinates. Each pixel in an image of size represents spherical angles , computed as and , with intrinsics
The forward projection and its inverse follow:
- , ,
For each patch with image coordinate and inverse-depth , the 3D point corresponding to is reprojected from frame to frame via :
Simultaneously, the learned flow head predicts an offset for patch matching:
The objective minimized by ODBA is the weighted sum of squared reprojection errors:
where is a confidence-weighted covariance produced by the RNN updater.
2. Parameterization and Feature Interfaces
ODBA jointly refines the following:
- Camera poses , parameterized as 6D Lie algebra increments
- Inverse depths associated with sparse “patch” landmarks
Rather than dense depth maps, ODBA associates each sparse patch with a single inverse-depth scalar , initialized from the keyframe’s mean patch depth. The distortion-aware spherical feature extractor (DAS-Feat) computes matching features and context features via SphereResNet. A patchification module selects high saliency centers by evaluating the gradient magnitude of and crops neighborhoods , inheriting the center’s depth . Edges are created to link each patch from a keyframe to neighbors in a temporal window.
This approach is distinct from pinhole bundle adjustment in two respects: (a) Projections use spherical trigonometry rather than perspective projection. (b) Depth is parameterized per sparse patch, not per pixel or mesh vertex.
3. Optimization and Differentiability
Optimization utilizes a Gauss–Newton solver (without damping), typically running 3–5 iterations over Lie algebra pose increments and scalar depth updates. For each pair , residuals and Jacobians are computed:
The full block-Hessian (Eq. (9)) encompasses the system:
1 2 3 |
| wJ_iᵀJ_i wJ_iᵀJ_j wJ_iᵀJ_d | |Δξ_i| | wJ_iᵀe | | wJ_jᵀJ_i wJ_jᵀJ_j wJ_jᵀJ_d | |Δξ_j| = | wJ_jᵀe | | wJ_dᵀJ_i wJ_dᵀJ_j wJ_dᵀJ_d | |Δd | | wJ_dᵀe | |
Schur complement eliminates for pose updates, followed by scalar depth updates:
Jacobian derivations follow:
All linear solvers and pose updates are implemented in PyTorch with autograd compatibility, enabling backpropagation of gradients through the residuals, Jacobians, and network modules.
4. Data Flow and Algorithmic Pipeline
Figure 1 in (Guo et al., 5 Jan 2026) depicts the workflow, summarized in the following pseudocode block:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Inputs: {I_i,…,I_j}, {T_i,…,T_j}, {d_1,…,d_N}
f_i, h_i ← SphereResNet(I_i) for all frames
{(p_k, d_k, g_k)} ← Patchify(f_i)
E ← build_edges(i, r)
for iter = 1 ... M do
for each (k, j) ∈ E do
gpatch ← g_k
hpatch ← crop(h_j, Π(T_{ji} ⋅ Π⁻¹(p_k, d_k)))
corr ← ⟨gpatch, hpatch⟩
p_{k j}* ← p_k + F_rnn(corr)
p_{k j}' ← Π(T_{ji} ⋅ Π⁻¹(p_k, d_k))
e_{k j} ← p_{k j}* − p_{k j}'
Compute J_i, J_j, J_d
end
Assemble H, b via ∑ JᵀJ, Jᵀe
Solve [H][Δξ;Δd]=b (Schur complement)
Update poses and depths
end
Output: refined {T_n}, {d_k} |
At each iteration, the network computes patch-to-frame correlations and flow updates, applies spherical reprojection, evaluates residuals, computes Jacobians, constructs the Hessian system, and updates variables.
5. Experimental Validation
Ablation studies conducted on the real-world 360DVO benchmark and public synthetic datasets (TartanAir V2 and 360VO) demonstrate ODBA's efficacy:
- DPVO (pinhole + DBA) on 360DVO: ATE = 7.83 m (Easy), 6.92 m (Hard)
- Using ODBA but keeping a standard ResNet (no sphere): catastrophic failure (9.99 m)
- Pure SphereNet features + ODBA: divergence, indicating instability without residual connections
- Full 360DVO (SphereResNet + ODBA): ATE = 3.31 m (Easy), 4.35 m (Hard), 56% reduction over pinhole DPVO
On TartanAirV2 and synthetic 360VO datasets, 360DVO with ODBA reduces ATE by approximately 37.5% and 10%, respectively, over leading baselines including 360VO and OpenVSLAM.
End-to-end training with ODBA is essential; it enables accurate spherical geometry constraints and allows geometric supervision to backpropagate into DAS-Feat and the flow updater, critical for robust optimization in non-perspective geometries.
6. Distinctions and Implications
ODBA extends classical bundle adjustment to omnidirectional imagery by:
- Employing spherical trigonometry for pixel-to-ray conversion via custom projection heads
- Optimizing sparse inverse-depths per patch instead of dense pixelwise depth
- Utilizing end-to-end differentiability through all optimization blocks, facilitating joint learning of features and geometry
A plausible implication is that the interplay of feature learning (via DAS-Feat), robust spherical reprojection, and differentiable optimization constitutes a framework that achieves substantial improvements in robustness and accuracy over non-differentiable, perspective-only BA modules. The necessity of residual connections and spherical-aware networks is evidenced by failed or divergent runs in ablation studies, reinforcing the criticality of architectural choices for ODBA convergence.
In summary, ODBA comprises an omnidirectional, end-to-end differentiable bundle adjustment pipeline that tightly couples learned distortion-resistant features and spherical reprojection geometry via a Lie-algebra-based Gauss–Newton solver, yielding substantial gains in pose and depth estimation accuracy for monocular 360-degree visual odometry (Guo et al., 5 Jan 2026).