- The paper introduces a Hybrid formulation combining object-centric and world-centric approaches to accurately estimate dynamic object trajectories and static environments.
- The proposed method leverages incremental optimization with iSAM2, producing smaller Bayes Tree cliques and faster update times than traditional batch methods.
- The Parallel-Hybrid architecture decouples static and dynamic estimations, achieving real-time performance with scalability and robustness in cluttered, dynamic scenes.
Online Dynamic SLAM with Incremental Smoothing and Mapping
Introduction and Motivation
The paper addresses the computational challenges inherent in Dynamic Simultaneous Localization and Mapping (SLAM), specifically the joint estimation of static and dynamic scene components in environments with multiple moving objects. Existing dynamic SLAM methods typically rely on batch optimization, which is computationally intensive and unsuitable for real-time applications. The authors propose a novel factor-graph formulation and system architecture that leverages incremental optimization techniques, notably iSAM2, to enable efficient online estimation of camera pose, static structure, and dynamic object motion.
Figure 1: Output of the proposed Dynamic SLAM system in indoor environments with multiple moving objects, showing joint incremental estimation of camera pose, static scene, and dynamic object trajectories.
Dynamic SLAM systems can be categorized into object-centric and world-centric representations. Object-centric methods represent dynamic points in the object’s body frame, reducing the number of state variables but often sacrificing motion accuracy due to implicit rigidity constraints. World-centric methods explicitly model rigid-body kinematics in a global frame, achieving higher accuracy but resulting in densely connected factor graphs that are computationally expensive for batch solvers.
Incremental optimization algorithms such as iSAM2 exploit the sparsity of the underlying problem to enable efficient updates. However, naive application of iSAM2 to dynamic SLAM formulations leads to large cliques in the Bayes Tree, causing inefficient computation and poor scalability as the number of dynamic objects increases.
Figure 2: Bayes Tree generated using a world-centric formulation, illustrating large cliques that hinder efficient incremental inference.
The core contribution is a Hybrid representation that combines the benefits of object-centric point representation and world-centric motion modeling. Each dynamic object is anchored to an embedded frame {Le​} defined at its first observation. Object points are static in {Le​}, and object motion is modeled as a relative transformation TW,e→k​ transporting {Le​} through time. This approach allows the object map to grow naturally as new fragments become visible and enables direct recovery of object pose and velocity at any frame.
Figure 3: Hybrid Dynamic SLAM representation showing object points anchored in an embedded frame and transported through time via world-centric motions.
The factor graph construction enforces rigid-body motion via hybrid motion factors and introduces ternary smoothing factors to incentivize physically plausible trajectories. The formulation is designed to preserve sparsity, with dynamic points forming leaf nodes in the graph, enabling efficient variable elimination and incremental updates.
Figure 4: Full Hybrid Dynamic SLAM factor-graph with static points, camera poses, and dynamic objects connected by hybrid motion and smoothing factors.
Parallel-Hybrid Architecture
To further enhance scalability, the authors propose the Parallel-Hybrid architecture, which partitions the factor graph into a static factor graph (SFG) and multiple dynamic object factor graphs (DOFGs). Each DOFG is conditioned on the current camera pose estimate, fully decoupling dynamic object estimation from the static scene and enabling parallel inference. This architecture maintains separate iSAM2 instances for each graph, with information flow from the SFG to the DOFGs via pose priors.
Figure 5: Parallel-Hybrid approach decoupling static and dynamic components, enabling parallel incremental inference for each dynamic object.
Experimental Evaluation
Estimation Accuracy
The Hybrid and Parallel-Hybrid methods are evaluated on multiple datasets (KITTI, Outdoor Cluster, OMD, TartanAir, VIODE) against a state-of-the-art world-centric baseline. Both batch and incremental solvers are considered. The Hybrid formulation achieves accuracy equal to or better than the baseline in camera pose and object motion estimation, with marginal differences in RMSE metrics. Incremental solvers (iHybrid, Parallel-Hybrid) introduce small accuracy degradations due to partial state updates, but maintain competitive performance.
Scalability and Efficiency
Bayes Tree analysis demonstrates that the Hybrid formulation produces consistently smaller cliques and fewer re-eliminated variables per frame compared to the baseline, validating its suitability for incremental inference.
Figure 6: Bayes Tree evaluation on KITTI 20, showing reduced clique sizes and improved scalability for the Hybrid formulation.
Parallel-Hybrid maintains bounded update times and constant clique sizes even in sequences with persistent object visibility, highlighting its robustness in highly dynamic environments.
Figure 7: Parallel-Hybrid evaluation on OMD-S4U, showing per-object update time, clique size, and variable counts.
Timing results indicate that Parallel-Hybrid is up to 5× faster than the baseline and 2× more efficient than iHybrid, achieving real-time optimization frequencies (1–5 Hz) across all tested sequences. The baseline fails on long sequences with many objects due to memory exhaustion from large cliques.
Figure 8: Per-frame iSAM2 update time on selected sequences, with failure points highlighted and object counts indicated.
Real-World Deployment
The system is deployed on indoor sequences with multiple non-rigid dynamic objects, achieving online performance with iSAM2 update times under 150 ms and successful estimation of camera and object trajectories in crowded scenes.
Theoretical and Practical Implications
The Hybrid formulation demonstrates that careful factor-graph design can significantly enhance sparsity and scalability in dynamic SLAM, enabling efficient incremental inference. The Parallel-Hybrid architecture offers a practical trade-off between accuracy and computational cost, with decoupling improving efficiency but slightly degrading joint estimation accuracy. The results affirm that joint estimation of static and dynamic components is beneficial for both camera pose and object motion accuracy.
The approach is extensible to scenarios with known object kinematics, and future work should focus on enabling bi-directional information flow between static and dynamic components to further improve estimation quality. The insights into Bayes Tree topology and clique formation are relevant for designing scalable SLAM systems in multi-object and multi-robot settings.
Conclusion
The paper presents a novel Hybrid factor-graph formulation and Parallel-Hybrid architecture for online dynamic SLAM, enabling efficient incremental estimation of camera pose and dynamic object motion. The approach achieves state-of-the-art accuracy and substantial computational speed-ups, with strong scalability in highly dynamic environments. The findings provide a foundation for future research on joint estimation strategies and scalable SLAM architectures for real-world robotic applications.