Omnidirectional Differentiable Bundle Adjustment

Updated 6 January 2026

ODBA is a spherical optimization module that refines camera poses and sparse inverse depths for 360° visual odometry using a differentiable Gauss–Newton approach.
It leverages a distortion-aware feature extractor and spherical reprojection models, reducing absolute trajectory error by up to 56% compared to traditional methods.
The framework supports end-to-end training by integrating geometric supervision with feature learning, ensuring robust performance under aggressive motion and distortions.

Omnidirectional Differentiable Bundle Adjustment (ODBA) is a geometric optimization module designed for accurate pose and depth refinement in monocular omnidirectional visual odometry (OVO) systems leveraging 360-degree cameras. Unlike classical photometric or feature-based approaches, ODBA integrates a distortion-aware spherical feature interface and employs spherical trigonometric camera models, enabling joint optimization of camera poses and sparse landmark inverse depths via a differentiable Gauss–Newton solver with hand-derived Jacobians. This formulation supports end-to-end training, geometric supervision, and robust pose estimation under challenging conditions such as aggressive motion and severe image distortion (Guo et al., 5 Jan 2026).

1. Mathematical Foundations

ODBA models the geometry of a 360-degree equirectangular image as a mapping between image pixels and spherical coordinates. Each pixel $(u,v)$ in an image of size $H \times W$ represents spherical angles $(\theta,\phi)$ , computed as $\theta=\arctan(x/z)$ and $\phi=\arcsin(y/\|X\|)$ , with intrinsics

$K = \begin{bmatrix} W/(2\pi) & 0 & W/2 \ 0 & -H/\pi & H/2 \ 0 & 0 & 1 \end{bmatrix}.$

The forward projection $\Pi(X)$ and its inverse $\Pi^{-1}(p,d)$ follow:

$\Pi(X) = K[\theta;\phi;1]$ , $\theta = \arctan2(x,z)$ , $\phi = \arcsin(y/\|X\|)$
$\Pi^{-1}(p, d) = X = \frac{1}{d}\,[\cos\phi\,\sin\theta;\,\cos\phi\,\cos\theta;\,\sin\phi]$

For each patch $k$ with image coordinate $p_k^i$ and inverse-depth $d_k$ , the 3D point corresponding to $p_k^i$ is reprojected from frame $i$ to frame $j$ via $T_{j i} = T_j T_i^{-1}$ :

$p_{k j}' = \Pi\left(T_{j i} \cdot \Pi^{-1}(p_k^i, d_k)\right)$

Simultaneously, the learned flow head predicts an offset for patch matching:

$p_{k j}^* = p_k^i + F_{rnn}\left(\langle g_k, h_j \rangle\right)$

The objective minimized by ODBA is the weighted sum of squared reprojection errors:

$E(\{T\}, d) = \sum_{(k,j)\in E} \| p_{k j}^* - p_{k j}' \|^2_{\Sigma_{k j}}$

where $\Sigma_{k j}$ is a confidence-weighted covariance produced by the RNN updater.

2. Parameterization and Feature Interfaces

ODBA jointly refines the following:

Camera poses $T_n\in SE(3)$ , parameterized as 6D Lie algebra increments $\xi_n$
Inverse depths $d_k$ associated with $N$ sparse “patch” landmarks

Rather than dense depth maps, ODBA associates each sparse patch $k$ with a single inverse-depth scalar $d_k$ , initialized from the keyframe’s mean patch depth. The distortion-aware spherical feature extractor (DAS-Feat) computes matching features $f_i \in \mathbb{R}^{H \times W \times 128}$ and context features $h_i \in \mathbb{R}^{H \times W \times 384}$ via SphereResNet. A patchification module selects high saliency centers $p_k$ by evaluating the gradient magnitude of $f_i$ and crops $3 \times 3$ neighborhoods $g_k \in \mathbb{R}^{3 \times 3 \times 128}$ , inheriting the center’s depth $d_k$ . Edges $E$ are created to link each patch from a keyframe $i$ to neighbors $j$ in a temporal window.

This approach is distinct from pinhole bundle adjustment in two respects: (a) Projections $\Pi, \Pi^{-1}$ use spherical trigonometry rather than perspective projection. (b) Depth is parameterized per sparse patch, not per pixel or mesh vertex.

3. Optimization and Differentiability

Optimization utilizes a Gauss–Newton solver (without damping), typically running 3–5 iterations over Lie algebra pose increments and scalar depth updates. For each pair $(k,j)$ , residuals and Jacobians are computed:

$e_{k j} = p_{k j}^* - p_{k j}'$
$J_i = \frac{\partial p_{k j}'}{\partial \xi_i} \in \mathbb{R}^{2 \times 6}$
$J_j = \frac{\partial p_{k j}'}{\partial \xi_j} \in \mathbb{R}^{2 \times 6}$
$J_d = \frac{\partial p_{k j}'}{\partial d_k} \in \mathbb{R}^{2 \times 1}$

The full block-Hessian (Eq. (9)) encompasses the system:

1
2
3

| wJ_iᵀJ_i   wJ_iᵀJ_j   wJ_iᵀJ_d | |Δξ_i|   | wJ_iᵀe |
| wJ_jᵀJ_i   wJ_jᵀJ_j   wJ_jᵀJ_d | |Δξ_j| = | wJ_jᵀe |
| wJ_dᵀJ_i   wJ_dᵀJ_j   wJ_dᵀJ_d | |Δd  |   | wJ_dᵀe |

where

w = \sqrt{\Sigma_{k j}^{-1}}

Schur complement eliminates $\Delta d$ for pose updates, followed by scalar depth updates:

$T_n \leftarrow \exp(\Delta \xi_n)\,T_n$
$d_k \leftarrow d_k + \Delta d_k$

Jacobian derivations follow:

$J_j = \frac{\partial \Pi(X')}{\partial X'} \cdot \frac{\partial X'}{\partial \xi_j}$
$J_i = -J_j\cdot \text{Adj}(T_{ij})$
$J_d = \frac{\partial \Pi(X')}{\partial X'} \cdot T_{ij} \cdot \frac{\partial \Pi^{-1}(p,d)}{\partial d}$

All linear solvers and pose updates are implemented in PyTorch with autograd compatibility, enabling backpropagation of gradients through the residuals, Jacobians, and network modules.

4. Data Flow and Algorithmic Pipeline

Figure 1 in (Guo et al., 5 Jan 2026) depicts the workflow, summarized in the following pseudocode block:

Inputs: {I_i,…,I_j}, {T_i,…,T_j}, {d_1,…,d_N}
f_i, h_i ← SphereResNet(I_i) for all frames
{(p_k, d_k, g_k)} ← Patchify(f_i)
E ← build_edges(i, r)
for iter = 1 ... M do
  for each (k, j) ∈ E do
    gpatch ← g_k
    hpatch ← crop(h_j, Π(T_{ji} ⋅ Π⁻¹(p_k, d_k)))
    corr ← ⟨gpatch, hpatch⟩
    p_{k j}* ← p_k + F_rnn(corr)
    p_{k j}' ← Π(T_{ji} ⋅ Π⁻¹(p_k, d_k))
    e_{k j} ← p_{k j}* − p_{k j}'
    Compute J_i, J_j, J_d
  end
  Assemble H, b via ∑ JᵀJ, Jᵀe
  Solve [H][Δξ;Δd]=b (Schur complement)
  Update poses and depths
end
Output: refined {T_n}, {d_k}

At each iteration, the network computes patch-to-frame correlations and flow updates, applies spherical reprojection, evaluates residuals, computes Jacobians, constructs the Hessian system, and updates variables.

5. Experimental Validation

Ablation studies conducted on the real-world 360DVO benchmark and public synthetic datasets (TartanAir V2 and 360VO) demonstrate ODBA's efficacy:

DPVO (pinhole + DBA) on 360DVO: ATE = 7.83 m (Easy), 6.92 m (Hard)
Using ODBA but keeping a standard ResNet (no sphere): catastrophic failure (9.99 m)
Pure SphereNet features + ODBA: divergence, indicating instability without residual connections
Full 360DVO (SphereResNet + ODBA): ATE = 3.31 m (Easy), 4.35 m (Hard), 56% reduction over pinhole DPVO

On TartanAirV2 and synthetic 360VO datasets, 360DVO with ODBA reduces ATE by approximately 37.5% and 10%, respectively, over leading baselines including 360VO and OpenVSLAM.

End-to-end training with ODBA is essential; it enables accurate spherical geometry constraints and allows geometric supervision to backpropagate into DAS-Feat and the flow updater, critical for robust optimization in non-perspective geometries.

6. Distinctions and Implications

ODBA extends classical bundle adjustment to omnidirectional imagery by:

Employing spherical trigonometry for pixel-to-ray conversion via custom projection heads
Optimizing sparse inverse-depths per patch instead of dense pixelwise depth
Utilizing end-to-end differentiability through all optimization blocks, facilitating joint learning of features and geometry

A plausible implication is that the interplay of feature learning (via DAS-Feat), robust spherical reprojection, and differentiable optimization constitutes a framework that achieves substantial improvements in robustness and accuracy over non-differentiable, perspective-only BA modules. The necessity of residual connections and spherical-aware networks is evidenced by failed or divergent runs in ablation studies, reinforcing the criticality of architectural choices for ODBA convergence.

In summary, ODBA comprises an omnidirectional, end-to-end differentiable bundle adjustment pipeline that tightly couples learned distortion-resistant features and spherical reprojection geometry via a Lie-algebra-based Gauss–Newton solver, yielding substantial gains in pose and depth estimation accuracy for monocular 360-degree visual odometry (Guo et al., 5 Jan 2026).

PDF Markdown Chat (Pro)

References (1)

360DVO: Deep Visual Odometry for Monocular 360-Degree Camera (2026)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Omnidirectional Differentiable Bundle Adjustment (ODBA).