Papers
Topics
Authors
Recent
2000 character limit reached

Omnidirectional Differentiable Bundle Adjustment

Updated 6 January 2026
  • ODBA is a spherical optimization module that refines camera poses and sparse inverse depths for 360° visual odometry using a differentiable Gauss–Newton approach.
  • It leverages a distortion-aware feature extractor and spherical reprojection models, reducing absolute trajectory error by up to 56% compared to traditional methods.
  • The framework supports end-to-end training by integrating geometric supervision with feature learning, ensuring robust performance under aggressive motion and distortions.

Omnidirectional Differentiable Bundle Adjustment (ODBA) is a geometric optimization module designed for accurate pose and depth refinement in monocular omnidirectional visual odometry (OVO) systems leveraging 360-degree cameras. Unlike classical photometric or feature-based approaches, ODBA integrates a distortion-aware spherical feature interface and employs spherical trigonometric camera models, enabling joint optimization of camera poses and sparse landmark inverse depths via a differentiable Gauss–Newton solver with hand-derived Jacobians. This formulation supports end-to-end training, geometric supervision, and robust pose estimation under challenging conditions such as aggressive motion and severe image distortion (Guo et al., 5 Jan 2026).

1. Mathematical Foundations

ODBA models the geometry of a 360-degree equirectangular image as a mapping between image pixels and spherical coordinates. Each pixel (u,v)(u,v) in an image of size H×WH \times W represents spherical angles (θ,ϕ)(\theta,\phi), computed as θ=arctan(x/z)\theta=\arctan(x/z) and ϕ=arcsin(y/X)\phi=\arcsin(y/\|X\|), with intrinsics

K=[W/(2π)0W/2 0H/πH/2 001].K = \begin{bmatrix} W/(2\pi) & 0 & W/2 \ 0 & -H/\pi & H/2 \ 0 & 0 & 1 \end{bmatrix}.

The forward projection Π(X)\Pi(X) and its inverse Π1(p,d)\Pi^{-1}(p,d) follow:

  • Π(X)=K[θ;ϕ;1]\Pi(X) = K[\theta;\phi;1], θ=arctan2(x,z)\theta = \arctan2(x,z), ϕ=arcsin(y/X)\phi = \arcsin(y/\|X\|)
  • Π1(p,d)=X=1d[cosϕsinθ;cosϕcosθ;sinϕ]\Pi^{-1}(p, d) = X = \frac{1}{d}\,[\cos\phi\,\sin\theta;\,\cos\phi\,\cos\theta;\,\sin\phi]

For each patch kk with image coordinate pkip_k^i and inverse-depth dkd_k, the 3D point corresponding to pkip_k^i is reprojected from frame ii to frame jj via Tji=TjTi1T_{j i} = T_j T_i^{-1}:

pkj=Π(TjiΠ1(pki,dk))p_{k j}' = \Pi\left(T_{j i} \cdot \Pi^{-1}(p_k^i, d_k)\right)

Simultaneously, the learned flow head predicts an offset for patch matching:

pkj=pki+Frnn(gk,hj)p_{k j}^* = p_k^i + F_{rnn}\left(\langle g_k, h_j \rangle\right)

The objective minimized by ODBA is the weighted sum of squared reprojection errors:

E({T},d)=(k,j)EpkjpkjΣkj2E(\{T\}, d) = \sum_{(k,j)\in E} \| p_{k j}^* - p_{k j}' \|^2_{\Sigma_{k j}}

where Σkj\Sigma_{k j} is a confidence-weighted covariance produced by the RNN updater.

2. Parameterization and Feature Interfaces

ODBA jointly refines the following:

  • Camera poses TnSE(3)T_n\in SE(3), parameterized as 6D Lie algebra increments ξn\xi_n
  • Inverse depths dkd_k associated with NN sparse “patch” landmarks

Rather than dense depth maps, ODBA associates each sparse patch kk with a single inverse-depth scalar dkd_k, initialized from the keyframe’s mean patch depth. The distortion-aware spherical feature extractor (DAS-Feat) computes matching features fiRH×W×128f_i \in \mathbb{R}^{H \times W \times 128} and context features hiRH×W×384h_i \in \mathbb{R}^{H \times W \times 384} via SphereResNet. A patchification module selects high saliency centers pkp_k by evaluating the gradient magnitude of fif_i and crops 3×33 \times 3 neighborhoods gkR3×3×128g_k \in \mathbb{R}^{3 \times 3 \times 128}, inheriting the center’s depth dkd_k. Edges EE are created to link each patch from a keyframe ii to neighbors jj in a temporal window.

This approach is distinct from pinhole bundle adjustment in two respects: (a) Projections Π,Π1\Pi, \Pi^{-1} use spherical trigonometry rather than perspective projection. (b) Depth is parameterized per sparse patch, not per pixel or mesh vertex.

3. Optimization and Differentiability

Optimization utilizes a Gauss–Newton solver (without damping), typically running 3–5 iterations over Lie algebra pose increments and scalar depth updates. For each pair (k,j)(k,j), residuals and Jacobians are computed:

  • ekj=pkjpkje_{k j} = p_{k j}^* - p_{k j}'
  • Ji=pkjξiR2×6J_i = \frac{\partial p_{k j}'}{\partial \xi_i} \in \mathbb{R}^{2 \times 6}
  • Jj=pkjξjR2×6J_j = \frac{\partial p_{k j}'}{\partial \xi_j} \in \mathbb{R}^{2 \times 6}
  • Jd=pkjdkR2×1J_d = \frac{\partial p_{k j}'}{\partial d_k} \in \mathbb{R}^{2 \times 1}

The full block-Hessian (Eq. (9)) encompasses the system:

1
2
3
| wJ_iᵀJ_i   wJ_iᵀJ_j   wJ_iᵀJ_d | |Δξ_i|   | wJ_iᵀe |
| wJ_jᵀJ_i   wJ_jᵀJ_j   wJ_jᵀJ_d | |Δξ_j| = | wJ_jᵀe |
| wJ_dᵀJ_i   wJ_dᵀJ_j   wJ_dᵀJ_d | |Δd  |   | wJ_dᵀe |
where w=Σkj1w = \sqrt{\Sigma_{k j}^{-1}}.

Schur complement eliminates Δd\Delta d for pose updates, followed by scalar depth updates:

  • Tnexp(Δξn)TnT_n \leftarrow \exp(\Delta \xi_n)\,T_n
  • dkdk+Δdkd_k \leftarrow d_k + \Delta d_k

Jacobian derivations follow:

  • Jj=Π(X)XXξjJ_j = \frac{\partial \Pi(X')}{\partial X'} \cdot \frac{\partial X'}{\partial \xi_j}
  • Ji=JjAdj(Tij)J_i = -J_j\cdot \text{Adj}(T_{ij})
  • Jd=Π(X)XTijΠ1(p,d)dJ_d = \frac{\partial \Pi(X')}{\partial X'} \cdot T_{ij} \cdot \frac{\partial \Pi^{-1}(p,d)}{\partial d}

All linear solvers and pose updates are implemented in PyTorch with autograd compatibility, enabling backpropagation of gradients through the residuals, Jacobians, and network modules.

4. Data Flow and Algorithmic Pipeline

Figure 1 in (Guo et al., 5 Jan 2026) depicts the workflow, summarized in the following pseudocode block:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Inputs: {I_i,…,I_j}, {T_i,…,T_j}, {d_1,…,d_N}
f_i, h_i ← SphereResNet(I_i) for all frames
{(p_k, d_k, g_k)} ← Patchify(f_i)
E ← build_edges(i, r)
for iter = 1 ... M do
  for each (k, j) ∈ E do
    gpatch ← g_k
    hpatch ← crop(h_j, Π(T_{ji} ⋅ Π⁻¹(p_k, d_k)))
    corr ← ⟨gpatch, hpatch⟩
    p_{k j}* ← p_k + F_rnn(corr)
    p_{k j}' ← Π(T_{ji} ⋅ Π⁻¹(p_k, d_k))
    e_{k j} ← p_{k j}* − p_{k j}'
    Compute J_i, J_j, J_d
  end
  Assemble H, b via ∑ JᵀJ, Jᵀe
  Solve [H][Δξ;Δd]=b (Schur complement)
  Update poses and depths
end
Output: refined {T_n}, {d_k}

At each iteration, the network computes patch-to-frame correlations and flow updates, applies spherical reprojection, evaluates residuals, computes Jacobians, constructs the Hessian system, and updates variables.

5. Experimental Validation

Ablation studies conducted on the real-world 360DVO benchmark and public synthetic datasets (TartanAir V2 and 360VO) demonstrate ODBA's efficacy:

  • DPVO (pinhole + DBA) on 360DVO: ATE = 7.83 m (Easy), 6.92 m (Hard)
  • Using ODBA but keeping a standard ResNet (no sphere): catastrophic failure (9.99 m)
  • Pure SphereNet features + ODBA: divergence, indicating instability without residual connections
  • Full 360DVO (SphereResNet + ODBA): ATE = 3.31 m (Easy), 4.35 m (Hard), 56% reduction over pinhole DPVO

On TartanAirV2 and synthetic 360VO datasets, 360DVO with ODBA reduces ATE by approximately 37.5% and 10%, respectively, over leading baselines including 360VO and OpenVSLAM.

End-to-end training with ODBA is essential; it enables accurate spherical geometry constraints and allows geometric supervision to backpropagate into DAS-Feat and the flow updater, critical for robust optimization in non-perspective geometries.

6. Distinctions and Implications

ODBA extends classical bundle adjustment to omnidirectional imagery by:

  • Employing spherical trigonometry for pixel-to-ray conversion via custom projection heads
  • Optimizing sparse inverse-depths per patch instead of dense pixelwise depth
  • Utilizing end-to-end differentiability through all optimization blocks, facilitating joint learning of features and geometry

A plausible implication is that the interplay of feature learning (via DAS-Feat), robust spherical reprojection, and differentiable optimization constitutes a framework that achieves substantial improvements in robustness and accuracy over non-differentiable, perspective-only BA modules. The necessity of residual connections and spherical-aware networks is evidenced by failed or divergent runs in ablation studies, reinforcing the criticality of architectural choices for ODBA convergence.

In summary, ODBA comprises an omnidirectional, end-to-end differentiable bundle adjustment pipeline that tightly couples learned distortion-resistant features and spherical reprojection geometry via a Lie-algebra-based Gauss–Newton solver, yielding substantial gains in pose and depth estimation accuracy for monocular 360-degree visual odometry (Guo et al., 5 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Omnidirectional Differentiable Bundle Adjustment (ODBA).