- The paper introduces DefVINS, which explicitly models non-rigid deformations using an observability-aware activation strategy to maintain metric consistency.
- The paper decouples rigid and non-rigid states by integrating an embedded deformation graph with inertial preintegration, achieving up to 45% error reduction in high-deformation scenarios.
- The paper demonstrates superior robustness and drift minimization in both synthetic and real-world tests, achieving an 80% reduction in trajectory errors under extreme deformation.
Introduction
Classical Visual-Inertial Odometry (VIO) systems, foundational to SLAM and mobile robotics, assume static or rigid environments. However, real-world scenes often present significant violations of this rigidity assumption due to non-rigid elements such as cloth, soft tissues, or cables. In such contexts, estimation frameworks over-fit to local non-rigid motion or suffer from pronounced drift if deformation dominates visual parallax, leading to a loss of reliability and accuracy. The paper "DefVINS: Visual-Inertial Odometry for Deformable Scenes" (2601.00702) introduces DefVINS, a VIO system that explicitly models non-rigid deformation while retaining metric consistency via inertial anchoring, guided by a novel observability-aware activation of non-rigid degrees of freedom.
DefVINS departs from classical VIO by decoupling the rigid, IMU-anchored state from the non-rigid scene component. The rigid state follows standard VIO practices, incorporating inertial preintegration and visual reprojection error, while the non-rigid warp is parameterized by an embedded deformation graph spanning a sliding keyframe window. Deformation nodes represent long-term scene feature tracks, with elasticity and viscosity priors regularizing their spatial and temporal evolution. Photometric consistency is maintained via direct tracking of image features.
Inertial measurements are integrated following the on-manifold preintegration approach, ensuring tight and globally consistent coupling between the camera and the IMU across all state updates. The resulting cost function jointly minimizes visual, inertial, and deformation regularization factors within a bounded optimization window, maintaining computational tractability and estimator consistency.
A central contribution of DefVINS is the explicit observability analysis of VIO under non-rigid motion. The paper demonstrates that deformation introduces latent degrees of freedom, leading to ambiguous or ill-conditioned estimates if visual residuals are the sole source of constraint. IMU measurements, despite not directly observing deformation, stabilize the rigid trajectory and improve the conditioning of both rigid and non-rigid subspaces.
Observability is quantitatively assessed by constructing the observability matrix from stacked Jacobians of all measurement and regularization terms. The rank and conditioning of this matrix reveal locally observable directions and indicate when it is statistically safe to activate additional non-rigid degrees of freedom. This principle is operationalized in DefVINS by a conditioning-based activation strategy: new non-rigid parameters are introduced only if the system is sufficiently well-constrained, avoiding overfitting and estimator degeneration.
Figure 1: Illustrative observability analysis under synthetic conditions. Evolution of the conditioning score log10(ρk) with increasing numbers of stacked keyframe pairs, demonstrating significant improvements from inertial sensing and non-rigid regularization.
Experimental Evaluation
Synthetic Experiments
DefVINS is evaluated on the Drunkard's Dataset, which provides dense ground-truth trajectories and varying degrees of non-rigid deformation. Key metrics include Absolute Trajectory Error (ATE) and Relative Pose Error (RPE). In low-deformation scenarios, all systems perform comparably. As deformation levels increase, classical rigid baselines (e.g., ORB-SLAM3) and visual-only non-rigid methods (e.g., NR-SLAM) suffer from large ATE and tracking loss. The full DefVINS configuration consistently exhibits the lowest ATE and RPE, with error reductions up to 45% over rigid baselines in high-deformation regimes, and marked robustness in the number of successfully tracked frames.
Observability in Practice
Numerical experiments confirm that inertial sensing and visco-elastic regularization dramatically improve the conditioning of the estimation problem, as measured by the singular value spectrum of the observability matrix. Even with as few as 5–10 keyframes, DefVINS achieves well-constrained estimation in both the rigid and non-rigid components, preventing degenerate solutions that would otherwise occur in purely visual or rigid models.
Real-World Sequences
DefVINS is further validated on real-world sequences of a deforming cloth object, using synchronized RGB-D and IMU measurements. The system substantially outperforms rigid and non-rigid baselines in moderate to high deformation scenarios. The full model achieves an 80% reduction in ATE under extreme deformation, with superior long-term tracking resilience.
Figure 2: Qualitative comparison of DefVINS and baseline systems for a challenging deformable sequence. Full DefVINS yields trajectories with the highest global consistency and lowest trajectory error, maintaining alignment with ground-truth despite strong non-rigid motion.
The embedded deformation graph, regularized by spatial elasticity and temporal viscosity, enables fine-grained capture of localized scene changes. In practice, the graph illustrates spatially heterogeneous deformation—regions of high activity are allowed to deviate while globally coherent structure is preserved.
Figure 3: Deformation graph visualization for a representative sequence, showing spatially variable non-rigid motion detected and constrained by the system.
Implications and Future Directions
DefVINS advances the state-of-the-art in VIO for dynamic environments by demonstrating that explicit non-rigid modeling, when combined with IMU-based anchoring and observability-aware activation, enables robust, drift-minimal odometry in the presence of severe scene deformation. The approach highlights the necessity of integrating metric-inertial constraints and condition-adapted regularization to circumvent estimator degeneracy in dynamic settings.
Practical implications include deployment in robotics and AR settings where deformable interaction is routine (e.g., manipulation of soft materials, medical navigation). Theoretically, the observability-based gating provides a template for future state estimation systems to safely activate complex state components in a data-driven, statistically justified manner.
Future developments should explore dense deformation parameterizations and further leverage learned priors for complex material dynamics. Integrating robust semantic segmentation or multi-sensor fusion could further improve resilience in ambiguous or sparsely textured domains.
Conclusion
DefVINS establishes a rigorous framework for metric visual-inertial odometry in deformable environments. By coupling IMU-anchored rigidity with observability-gated non-rigid modeling, it achieves high-precision, drift-resistant estimates—even under challenging dynamic conditions. The methodology and analysis offer a principled foundation for advancing odometry in non-rigid, real-world scenarios (2601.00702).