DefVINS: Visual-Inertial Odometry for Deformable Scenes

Published 2 Jan 2026 in cs.RO and cs.CV | (2601.00702v1)

Abstract: Deformable scenes violate the rigidity assumptions underpinning classical visual-inertial odometry (VIO), often leading to over-fitting to local non-rigid motion or severe drift when deformation dominates visual parallax. We introduce DefVINS, a visual-inertial odometry framework that explicitly separates a rigid, IMU-anchored state from a non--rigid warp represented by an embedded deformation graph. The system is initialized using a standard VIO procedure that fixes gravity, velocity, and IMU biases, after which non-rigid degrees of freedom are activated progressively as the estimation becomes well conditioned. An observability analysis is included to characterize how inertial measurements constrain the rigid motion and render otherwise unobservable modes identifiable in the presence of deformation. This analysis motivates the use of IMU anchoring and informs a conditioning-based activation strategy that prevents ill-posed updates under poor excitation. Ablation studies demonstrate the benefits of combining inertial constraints with observability-aware deformation activation, resulting in improved robustness under non-rigid environments.

Abstract PDF Chat (Pro)

Summary

The paper introduces DefVINS, which explicitly models non-rigid deformations using an observability-aware activation strategy to maintain metric consistency.
The paper decouples rigid and non-rigid states by integrating an embedded deformation graph with inertial preintegration, achieving up to 45% error reduction in high-deformation scenarios.
The paper demonstrates superior robustness and drift minimization in both synthetic and real-world tests, achieving an 80% reduction in trajectory errors under extreme deformation.

DefVINS: Visual-Inertial Odometry for Deformable Scenes

Introduction

Classical Visual-Inertial Odometry (VIO) systems, foundational to SLAM and mobile robotics, assume static or rigid environments. However, real-world scenes often present significant violations of this rigidity assumption due to non-rigid elements such as cloth, soft tissues, or cables. In such contexts, estimation frameworks over-fit to local non-rigid motion or suffer from pronounced drift if deformation dominates visual parallax, leading to a loss of reliability and accuracy. The paper "DefVINS: Visual-Inertial Odometry for Deformable Scenes" (2601.00702) introduces DefVINS, a VIO system that explicitly models non-rigid deformation while retaining metric consistency via inertial anchoring, guided by a novel observability-aware activation of non-rigid degrees of freedom.

System Formulation

DefVINS departs from classical VIO by decoupling the rigid, IMU-anchored state from the non-rigid scene component. The rigid state follows standard VIO practices, incorporating inertial preintegration and visual reprojection error, while the non-rigid warp is parameterized by an embedded deformation graph spanning a sliding keyframe window. Deformation nodes represent long-term scene feature tracks, with elasticity and viscosity priors regularizing their spatial and temporal evolution. Photometric consistency is maintained via direct tracking of image features.

Inertial measurements are integrated following the on-manifold preintegration approach, ensuring tight and globally consistent coupling between the camera and the IMU across all state updates. The resulting cost function jointly minimizes visual, inertial, and deformation regularization factors within a bounded optimization window, maintaining computational tractability and estimator consistency.

Observability and Condition-Gated Deformation Activation

A central contribution of DefVINS is the explicit observability analysis of VIO under non-rigid motion. The paper demonstrates that deformation introduces latent degrees of freedom, leading to ambiguous or ill-conditioned estimates if visual residuals are the sole source of constraint. IMU measurements, despite not directly observing deformation, stabilize the rigid trajectory and improve the conditioning of both rigid and non-rigid subspaces.

Observability is quantitatively assessed by constructing the observability matrix from stacked Jacobians of all measurement and regularization terms. The rank and conditioning of this matrix reveal locally observable directions and indicate when it is statistically safe to activate additional non-rigid degrees of freedom. This principle is operationalized in DefVINS by a conditioning-based activation strategy: new non-rigid parameters are introduced only if the system is sufficiently well-constrained, avoiding overfitting and estimator degeneration.

Figure 1: Illustrative observability analysis under synthetic conditions. Evolution of the conditioning score $\log_{10}(\rho_k)$ with increasing numbers of stacked keyframe pairs, demonstrating significant improvements from inertial sensing and non-rigid regularization.

Experimental Evaluation

Synthetic Experiments

DefVINS is evaluated on the Drunkard's Dataset, which provides dense ground-truth trajectories and varying degrees of non-rigid deformation. Key metrics include Absolute Trajectory Error (ATE) and Relative Pose Error (RPE). In low-deformation scenarios, all systems perform comparably. As deformation levels increase, classical rigid baselines (e.g., ORB-SLAM3) and visual-only non-rigid methods (e.g., NR-SLAM) suffer from large ATE and tracking loss. The full DefVINS configuration consistently exhibits the lowest ATE and RPE, with error reductions up to 45% over rigid baselines in high-deformation regimes, and marked robustness in the number of successfully tracked frames.

Observability in Practice

Numerical experiments confirm that inertial sensing and visco-elastic regularization dramatically improve the conditioning of the estimation problem, as measured by the singular value spectrum of the observability matrix. Even with as few as 5–10 keyframes, DefVINS achieves well-constrained estimation in both the rigid and non-rigid components, preventing degenerate solutions that would otherwise occur in purely visual or rigid models.

Real-World Sequences

DefVINS is further validated on real-world sequences of a deforming cloth object, using synchronized RGB-D and IMU measurements. The system substantially outperforms rigid and non-rigid baselines in moderate to high deformation scenarios. The full model achieves an 80% reduction in ATE under extreme deformation, with superior long-term tracking resilience.

Figure 2: Qualitative comparison of DefVINS and baseline systems for a challenging deformable sequence. Full DefVINS yields trajectories with the highest global consistency and lowest trajectory error, maintaining alignment with ground-truth despite strong non-rigid motion.

Deformation Graph Analysis

The embedded deformation graph, regularized by spatial elasticity and temporal viscosity, enables fine-grained capture of localized scene changes. In practice, the graph illustrates spatially heterogeneous deformation—regions of high activity are allowed to deviate while globally coherent structure is preserved.

Figure 3: Deformation graph visualization for a representative sequence, showing spatially variable non-rigid motion detected and constrained by the system.

Implications and Future Directions

DefVINS advances the state-of-the-art in VIO for dynamic environments by demonstrating that explicit non-rigid modeling, when combined with IMU-based anchoring and observability-aware activation, enables robust, drift-minimal odometry in the presence of severe scene deformation. The approach highlights the necessity of integrating metric-inertial constraints and condition-adapted regularization to circumvent estimator degeneracy in dynamic settings.

Practical implications include deployment in robotics and AR settings where deformable interaction is routine (e.g., manipulation of soft materials, medical navigation). Theoretically, the observability-based gating provides a template for future state estimation systems to safely activate complex state components in a data-driven, statistically justified manner.

Future developments should explore dense deformation parameterizations and further leverage learned priors for complex material dynamics. Integrating robust semantic segmentation or multi-sensor fusion could further improve resilience in ambiguous or sparsely textured domains.

Conclusion

DefVINS establishes a rigorous framework for metric visual-inertial odometry in deformable environments. By coupling IMU-anchored rigidity with observability-gated non-rigid modeling, it achieves high-precision, drift-resistant estimates—even under challenging dynamic conditions. The methodology and analysis offer a principled foundation for advancing odometry in non-rigid, real-world scenarios (2601.00702).