Papers
Topics
Authors
Recent
2000 character limit reached

DefVINS: VIO in Deformable Scenes

Updated 5 January 2026
  • DefVINS is a visual-inertial odometry framework that fuses rigid, IMU-anchored estimation with embedded deformation graphs to track non-rigid environments.
  • It employs a tightly-coupled sliding-window optimizer with conditioning-driven activation to manage rigid and non-rigid state components effectively.
  • Empirical studies demonstrate up to 80% reduction in trajectory errors under high-deformation conditions compared to traditional VIO systems.

DefVINS is a visual-inertial odometry framework designed to address state estimation in deformable scenes, where classical rigidity assumptions are violated and traditional VIO systems exhibit drift or overfit to non-rigid motion. DefVINS integrates a rigid, IMU-anchored state estimator with a non-rigid deformation module based on an embedded deformation graph. Its architecture, initialization pipeline, mathematical and optimization framework, observability analysis, and conditioning-driven activation strategy collectively enhance robustness and accuracy in environments exhibiting non-rigidity, as validated by quantitative ablation studies and benchmark datasets (Cerezo et al., 2 Jan 2026).

1. System Architecture and Initialization

DefVINS consists of two subsystems operating in a tightly-coupled sliding-window optimization:

  • Rigid, IMU-Anchored Estimator: Processes high-rate IMU data (accelerometer, gyroscope) and keyframe image poses; optimizes for camera poses {Rt,pt}\{R_t, p_t\}, velocities vtv_t, IMU biases (bg,ba)(b^g, b^a), and gravity direction g^\hat{g}.
  • Non-Rigid Deformation Module (Embedded Deformation Graph): Builds a sparse deformation graph from long-term feature tracks, estimating per-node 3D positions {xit}\{x_i^t\} for each keyframe.

System initialization invokes a standard rigid VIO pipeline akin to VINS-Mono (Qin et al., 2017, Wu, 2019). Initial estimates for scale, gravity direction, gyro bias, accelerometer bias, and velocity are fixed during early keyframes while non-rigid degrees of freedom are "locked". Activation of non-rigid parameters occurs only after the marginal Hessian of the rigid subsystem achieves sufficient conditioning (condition number below a threshold λ0\lambda_0), preventing early ill-posedness.

2. Mathematical Formulation

The state vector in DefVINS for a window of NN keyframes is: ξ=[Rtk,vtk,ptk]k=0N1[bg,ba,g^]ξNR\xi = \Big[ R_{t_k}, v_{t_k}, p_{t_k} \Big]_{k=0}^{N-1} \oplus \Big[ b^g, b^a, \hat{g} \Big] \oplus \xi_\mathrm{NR} where ξNR\xi_\mathrm{NR} comprises deformation graph node positions.

Rigid Subsystem

IMU kinematics follow: R˙(t)=R(t)(ω(t)bg(t)ng(t)),v˙(t)=g+R(t)(a(t)ba(t)na(t)),p˙(t)=v(t)\dot{R}(t) = R(t)(\omega(t) - b^g(t) - n^g(t))^\wedge, \quad \dot{v}(t) = g + R(t)(a(t) - b^a(t) - n^a(t)), \quad \dot{p}(t) = v(t) Preintegration yields ΔR~ij\Delta\widetilde{R}_{ij}, Δv~ij\Delta\widetilde{v}_{ij}, Δp~ij\Delta\widetilde{p}_{ij} with corresponding residuals: rΔR=Log(ΔR~ijRiRj)r_{\Delta R} = \mathrm{Log}\left(\Delta\widetilde{R}_{ij}^\top R_i^\top R_j\right)

rΔv=Ri(vjvigΔTij)Δv~ijr_{\Delta v} = R_i^\top (v_j - v_i - g\Delta T_{ij}) - \Delta\widetilde{v}_{ij}

rΔp=Ri(pjpiviΔTij12gΔTij2)Δp~ijr_{\Delta p} = R_i^\top \left(p_j - p_i - v_i\Delta T_{ij} - \frac{1}{2}g\Delta T_{ij}^2\right) - \Delta\widetilde{p}_{ij}

Non-Rigid Deformation Graph

Node positions are updated per keyframe as xitx_i^t. Scene point XX deformation is modeled: W(X;ξNR)=iN(X)wi(X)[X+δxit]W(X; \xi_\mathrm{NR}) = \sum_{i \in \mathcal{N}(X)} w_i(X) \big[X + \delta x_i^t\big] where weights wi(X)w_i(X) are Gaussian, normalized over KK nearest nodes.

Regularization terms include:

  • Elastic: Maintains rest-lengths:

Lijelas=k(xitxjtxi0xj0)2xi0xj0L_{ij}^\mathrm{elas} = k \frac{(\|x_i^t - x_j^t\| - \|x_i^0 - x_j^0\|)^2}{\|x_i^0 - x_j^0\|}

  • Viscous: Penalizes abrupt node motion:

Lijvisc=bijsitsjt2,bij=exp(xi0xj022σ2)L_{ij}^\mathrm{visc} = b_{ij} \|s_i^t - s_j^t\|^2, \quad b_{ij} = \exp\left(-\frac{\|x_i^0 - x_j^0\|^2}{2\sigma^2}\right)

  • Photometric: Semi-direct brightness constancy for nodes:

Liphoto=(It(uit)αiIt1(uit1)βi)2L_i^\mathrm{photo} = \left(I^t(u_i^t) - \alpha_i I^{t-1}(u_i^{t-1}) - \beta_i\right)^2

Joint Cost Function

The total loss for the window: L(ξ)=k=1N1{rΔRkΣR2+rΔvkΣv2+rΔpkΣp2+rgkΣg2+mFkrm,kvΣv2+λNRLNRk}+Lprior\mathcal{L}(\xi) = \sum_{k=1}^{N-1} \Big\{ \|r_{\Delta R}^k\|_{\Sigma_R}^2 + \|r_{\Delta v}^k\|_{\Sigma_v}^2 + \|r_{\Delta p}^k\|_{\Sigma_p}^2 + \|r_g^k\|_{\Sigma_g}^2 + \sum_{m \in \mathcal{F}_k} \|r^v_{m, k}\|_{\Sigma_v'}^2 + \lambda_\mathrm{NR} L_\mathrm{NR}^k \Big\} + \mathcal{L}_\mathrm{prior} where LNRkL_\mathrm{NR}^k sums elastic, viscous, and photometric regularization over graph edges and nodes.

3. Observability Analysis

DefVINS incorporates a detailed observability assessment:

  • Joint System Rank: Stacking Jacobians of all residuals produces observability matrix O\mathcal{O}. For persistently exciting IMU motion and sufficient deformation graph coverage, the local system is observable up to a four-dimensional SE(3) gauge (global position and yaw). Non-rigid modes are fully constrained except for this gauge (Cerezo et al., 2 Jan 2026).
  • Role of IMU Anchoring: IMU measurements determine metric scale and gravity direction, preventing deformation nodes from compensating for rigid drift, thus improving identifiability of both rigid and non-rigid state components.

Empirically, the joint system's conditioning (measured by the smallest ratio of singular values) improves rapidly when both IMU and non-rigid modules are activated.

4. Conditioning-Driven Progressive Activation

DefVINS employs a conditioning-based strategy for non-rigid node activation:

  • Activation Metric: The condition number κ(HNR)\kappa(H_\mathrm{NR}) of the non-rigid Hessian block is computed at each keyframe. New deformation nodes are only unlocked and incorporated into optimization when κ(HNR)<κthresh\kappa(H_\mathrm{NR}) < \kappa_\mathrm{thresh} (typically 10810^8), ensuring well-posed estimation.
  • Optimization Loop: The sliding-window optimizer utilizes Google Ceres (Levenberg–Marquardt, autodiff), robust Huber kernels, marginalizes oldest keyframes, and maintains a sparse deformation graph (≈200 nodes). Single-thread performance achieves ~20 ms per window update at ~10 Hz keyframe rate.

5. Experimental Evaluation

DefVINS was benchmarked on both synthetic and real datasets:

  • Synthetic (Drunkard’s Dataset): 19 RGB-D sequences, 4 deformation levels (L0–L3). Full DefVINS outperformed visual-only, rigid VIO, and NR-SLAM baselines by 30–50% as deformation increased, with errors at L3 (extreme deformation) dropping from 53.1 mm (ORB-SLAM3) to 19.6 mm (DefVINS) (Cerezo et al., 2 Jan 2026).
  • Real RGB-D-IMU: 7 real sequences with varying deformability. DefVINS sustained >80% trajectory coverage under high deformation, while ORB-SLAM3 failed (<20% tracking) in these settings. ATE reduction was approximately 80% for high-deformation cloth sequences.

Ablation studies confirmed the necessity of joint rigid/non-rigid estimation: visual-only versions exhibited severe drift; rigid-only IMU stabilization failed to track deformations; full DefVINS achieved consistent global and local accuracy.

6. Context, Limitations, and Prospective Extensions

  • Scope and Limitations: The framework assumes sparse graph structure and well-behaved singular value spectra; extreme under-excitation or poor node distribution may still degrade performance. Deformation node activation depends critically on the conditioning metric and IMU excitation.
  • Potential Extensions: Integrating high-order geometric constraints, denser graph connectivity, adaptive regularization parameters, or multi-modal sensory fusion (e.g., depth, tactile arrays) can further improve performance in severe non-rigid or textured scenes.
  • Comparative Impact: DefVINS systematically advances deformable-scene odometry, exceeding state-of-the-art methods in drift suppression and deformation tracking under metric conditions. This robustness is directly attributable to the interplay of IMU anchoring and conditioning-aware non-rigid estimation (Cerezo et al., 2 Jan 2026).

DefVINS defines a rigorously optimized, observable architecture for visual-inertial odometry in deformable environments, leveraging conditioning-based control of non-rigid degrees of freedom to secure drift-free, metric-scale trajectory estimation (Cerezo et al., 2 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to DefVINS.