Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Model 3D Registration: Finding Multiple Moving Objects in Cluttered Point Clouds (2402.10865v1)

Published 16 Feb 2024 in cs.RO and cs.CV

Abstract: We investigate a variation of the 3D registration problem, named multi-model 3D registration. In the multi-model registration problem, we are given two point clouds picturing a set of objects at different poses (and possibly including points belonging to the background) and we want to simultaneously reconstruct how all objects moved between the two point clouds. This setup generalizes standard 3D registration where one wants to reconstruct a single pose, e.g., the motion of the sensor picturing a static scene. Moreover, it provides a mathematically grounded formulation for relevant robotics applications, e.g., where a depth sensor onboard a robot perceives a dynamic scene and has the goal of estimating its own motion (from the static portion of the scene) while simultaneously recovering the motion of all dynamic objects. We assume a correspondence-based setup where we have putative matches between the two point clouds and consider the practical case where these correspondences are plagued with outliers. We then propose a simple approach based on Expectation-Maximization (EM) and establish theoretical conditions under which the EM approach converges to the ground truth. We evaluate the approach in simulated and real datasets ranging from table-top scenes to self-driving scenarios and demonstrate its effectiveness when combined with state-of-the-art scene flow methods to establish dense correspondences.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” Intl. J. of Robotics Research, vol. 31, no. 5, pp. 647–663, 2012.
  2. G. Blais and M. D. Levine, “Registering multiview range data to create 3d computer objects,” IEEE Trans. Pattern Anal. Machine Intell., vol. 17, no. 8, pp. 820–824, 1995.
  3. S. Choi, Q. Y. Zhou, and V. Koltun, “Robust reconstruction of indoor scenes,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5556–5565.
  4. B. Drost, M. Ulrich, N. Navab, and S. Ilic, “Model globally, match locally: Efficient and robust 3D object recognition,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 998–1005.
  5. J. M. Wong, V. Kee, T. Le, S. Wagner, G. L. Mariottini, A. Schneider, L. Hamilton, R. Chipalkatty, M. Hebert, D. M. S. Johnson et al., “Segicp: Integrated deep semantic segmentation and pose estimation,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 5784–5789.
  6. A. Zeng, K. T. Yu, S. Song, D. Suo, E. Walker, A. Rodriguez, and J. Xiao, “Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge,” in IEEE Intl. Conf. on Robotics and Automation (ICRA).   IEEE, 2017, pp. 1386–1383.
  7. M. A. Audette, F. P. Ferrie, and T. M. Peters, “An algorithmic overview of surface registration techniques for medical imaging,” Med. Image Anal., vol. 4, no. 3, pp. 201–217, 2000.
  8. G. K. L. Tam, Z. Q. Cheng, Y. K. Lai, F. C. Langbein, Y. Liu, D. Marshall, R. R. Martin, X. F. Sun, and P. L. Rosin, “Registration of 3d point clouds and meshes: a survey from rigid to nonrigid.” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 7, pp. 1199–1217, 2013.
  9. J. Bazin, Y. Seo, R. Hartley, and M. Pollefeys, “Globally optimal inlier set maximization with unknown rotation and focal length,” in European Conf. on Computer Vision (ECCV), 2014, pp. 803–817.
  10. G. Wahba, “A least squares estimate of satellite attitude,” SIAM review, vol. 7, no. 3, pp. 409–409, 1965.
  11. K. Arun, T. Huang, and S. Blostein, “Least-squares fitting of two 3-D point sets,” IEEE Trans. Pattern Anal. Machine Intell., vol. 9, no. 5, pp. 698–700, sept. 1987.
  12. B. K. P. Horn, “Closed-form solution of absolute orientation using unit quaternions,” J. Opt. Soc. Amer., vol. 4, no. 4, pp. 629–642, Apr 1987.
  13. H. Yang, J. Shi, and L. Carlone, “TEASER: Fast and Certifiable Point Cloud Registration,” IEEE Trans. Robotics, vol. 37, no. 2, pp. 314–333, 2020, extended arXiv version 2001.07715 (pdf).
  14. J. L. Barron, D. J. Fleet, and S. S. Beuachemin, “Performance of optical flow techniques,” Intl. J. of Computer Vision, vol. 12, no. 1, pp. 43–77, 1994.
  15. S. Vedula, P. Rander, R. Collins, and T. Kanade, “Three-dimensional scene flow,” IEEE Trans. Pattern Anal. Machine Intell., vol. 27, no. 3, pp. 475–480, 2005.
  16. L. Peng, C. Kümmerle, and R. Vidal, “On the convergence of irls and its variants in outlier-robust estimation,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 17 808–17 818.
  17. A. Barik and J. Honorio, “Outlier-robust estimation of a sparse linear model using invexity,” 2023.
  18. L. Carlone, “Estimation contracts for outlier-robust geometric perception,” Foundations and Trends (FnT) in Robotics, arXiv preprint: 2208.10521, 2023, (pdf).
  19. K. M. Tavish and T. D. Barfoot, “At all costs: A comparison of robust cost functions for camera correspondence outliers,” in Conf. Computer and Robot Vision.   IEEE, 2015, pp. 62–69.
  20. M. J. Black and A. Rangarajan, “On the unification of line processes, outlier rejection, and robust statistics with applications in early vision,” Intl. J. of Computer Vision, vol. 19, no. 1, pp. 57–91, 1996.
  21. H. Yang, P. Antonante, V. Tzoumas, and L. Carlone, “Graduated non-convexity for robust spatial perception: From non-minimal solvers to global outlier rejection,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1127–1134, 2020, arXiv preprint:1909.08605 (with supplemental material), (pdf).
  22. H. Yang and L. Carlone, “Certifiably optimal outlier-robust geometric perception: Semidefinite relaxations and scalable global optimization,” IEEE Trans. Pattern Anal. Machine Intell., 2022, (pdf).
  23. T.-J. Chin, Z. Cai, and F. Neumann, “Robust fitting in computer vision: Easy or hard?” in European Conf. on Computer Vision (ECCV), 2018.
  24. P. Antonante, V. Tzoumas, H. Yang, and L. Carlone, “Outlier-robust estimation: Hardness, minimally tuned algorithms, and applications,” IEEE Trans. Robotics, vol. 38, no. 1, pp. 281–301, 2021, (pdf).
  25. M. Fischler and R. Bolles, “Random sample consensus: a paradigm for model fitting with application to image analysis and automated cartography,” Commun. ACM, vol. 24, pp. 381–395, 1981.
  26. J. Shi, H. Yang, and L. Carlone, “ROBIN: a graph-theoretic approach to reject outliers in robust estimation using invariants,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2021, arXiv preprint: 2011.03659, (pdf).
  27. A. P. Bustos, T.-J. Chin, F. Neumann, T. Friedrich, and M. Katzmann, “A practical maximum clique algorithm for matching with pairwise constraints,” arXiv preprint arXiv:1902.01534, 2019.
  28. O. Enqvist, K. Josephson, and F. Kahl, “Optimal correspondences from pairwise constraints,” in Intl. Conf. on Computer Vision (ICCV), 2009, pp. 1295–1302.
  29. M. Bosse, G. Agamennoni, and I. Gilitschenski, “Robust estimation and applications in robotics,” Foundations and Trends in Robotics, vol. 4, no. 4, pp. 225–269, 2016.
  30. M. Charikar, J. Steinhardt, and G. Valiant, “Learning from untrusted data,” in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, ser. STOC 2017, 2017, pp. 47–60.
  31. S. Karmalkar, A. Klivans, and P. Kothari, “List-decodable linear regression,” in Advances in Neural Information Processing Systems (NIPS), vol. 32, 2019.
  32. P. Raghavendra and M. Yau, “List decodable learning via sum of squares,” in Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA ’20, 2020, p. 161–180.
  33. I. Diakonikolas, D. Kane, and D. Kongsgaard, “List-decodable mean estimation via iterative multi-filtering,” Advances in Neural Information Processing Systems, vol. 33, pp. 9312–9323, 2020.
  34. Y. Cherapanamjeri, S. Mohanty, and M. Yau, “List decodable mean estimation in nearly linear time,” in 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).   IEEE, 2020, pp. 141–148.
  35. L. Magri and A. Fusiello, “T-linkage: A continuous relaxation of j-linkage for multi-model fitting,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3954–3961.
  36. R. Toldo and A. Fusiello, “Robust multiple structures estimation with j-linkage,” in Computer Vision – ECCV 2008, D. Forsyth, P. Torr, and A. Zisserman, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 537–547.
  37. T.-J. Chin, H. Wang, and D. Suter, “Robust fitting of multiple structures: The statistical learning approach,” in 2009 IEEE 12th International Conference on Computer Vision, 2009, pp. 413–420.
  38. T.-J. Chin, D. Suter, and H. Wang, “Multi-structure model selection via kernel optimisation,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 3586–3593.
  39. L. Magri and A. Fusiello, “Multiple structure recovery via robust preference analysis,” Image and Vision Computing, vol. 67, pp. 1–15, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S026288561730152X
  40. M. Tepper and G. Sapiro, “Nonnegative matrix underapproximation for robust multiple model fitting,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 655–663.
  41. S. Lin, G. Xiao, Y. Yan, D. Suter, and H. Wang, “Hypergraph optimization for multi-structural geometric model fitting,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 8730–8737, Jul. 2019. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/4897
  42. P. Purkait, T.-J. Chin, A. Sadri, and D. Suter, “Clustering with hypergraphs: The case for large hyperedges,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 9, pp. 1697–1711, 2017.
  43. P. H. Torr, “Geometric motion segmentation and model selection,” Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, vol. 356, no. 1740, pp. 1321–1340, 1998.
  44. M. Zuliani, C. Kenney, and B. Manjunath, “The multiransac algorithm and its application to detect planar homographies,” in IEEE International Conference on Image Processing 2005, vol. 3, 2005, pp. III–153.
  45. L. Magri and A. Fusiello, “Multiple models fitting as a set coverage problem,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3318–3326.
  46. H. Isack and Y. Boykov, “Energy-based geometric multi-model fitting,” International Journal of Computer Vision, vol. 97, pp. 123–147, 04 2012.
  47. D. Baráth and J. Matas, “Progressive-x: Efficient, anytime, multi-model fitting algorithm,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 3779–3787.
  48. X. Yi, C. Caramanis, and S. Sanghavi, “Alternating minimization for mixed linear regression,” in International Conference on Machine Learning.   PMLR, 2014, pp. 613–621.
  49. H. Sedghi, M. Janzamin, and A. Anandkumar, “Provable tensor methods for learning mixtures of generalized linear models,” in Artificial Intelligence and Statistics.   PMLR, 2016, pp. 1223–1231.
  50. Y. Li and Y. Liang, “Learning mixtures of linear regressions with nearly optimal complexity,” in Conference On Learning Theory.   PMLR, 2018, pp. 1125–1144.
  51. X. Yi, C. Caramanis, and S. Sanghavi, “Solving a mixture of many random linear equations by tensor decomposition and alternating minimization,” CoRR, vol. abs/1608.05749, 2016. [Online]. Available: http://arxiv.org/abs/1608.05749
  52. S. Faria and G. Soromenho, “Fitting mixtures of linear regressions,” Journal of Statistical Computation and Simulation, vol. 80, no. 2, pp. 201–225, 2010. [Online]. Available: https://doi.org/10.1080/00949650802590261
  53. J. M. Klusowski, D. Yang, and W. D. Brinda, “Estimating the coefficients of a mixture of two linear regressions by expectation maximization,” IEEE Transactions on Information Theory, vol. 65, no. 6, pp. 3515–3524, 2019.
  54. J. Kwon and C. Caramanis, “Em converges for a mixture of many linear regressions,” in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, S. Chiappa and R. Calandra, Eds., vol. 108.   PMLR, 26–28 Aug 2020, pp. 1727–1736. [Online]. Available: https://proceedings.mlr.press/v108/kwon20a.html
  55. Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision – ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds.   Cham: Springer International Publishing, 2020, pp. 402–419.
  56. E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” CoRR, vol. abs/1612.01925, 2016. [Online]. Available: http://arxiv.org/abs/1612.01925
  57. W.-C. Ma, S. Wang, R. Hu, Y. Xiong, and R. Urtasun, “Deep rigid instance scene flow,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3609–3617.
  58. G. Yang and D. Ramanan, “Learning to segment rigid motions from two frames,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 1266–1275.
  59. Z. Teed and J. Deng, “Raft-3d: Scene flow using rigid-motion embeddings,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8371–8380.
  60. H. Liu, T. Lu, Y. Xu, J. Liu, W. Li, and L. Chen, “Camliflow: Bidirectional camera-lidar fusion for joint optical flow and scene flow estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5791–5801.
  61. H. Liu, T. Lu, Y. Xu, J. Liu, and L. Wang, “Learning optical flow and scene flow with bidirectional camera-lidar fusion,” 2023.
  62. T. K. Moon, “The expectation-maximization algorithm,” Signal processing magazine, IEEE, vol. 13, no. 6, pp. 47–60, 1996.
  63. B. Eckart, K. Kim, and J. Kautz, “Hgmr: Hierarchical gaussian mixtures for adaptive 3d registration,” in Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV.   Berlin, Heidelberg: Springer-Verlag, 2018, p. 730–746. [Online]. Available: https://doi.org/10.1007/978-3-030-01267-0_43
  64. J. G. Rogers, A. J. Trevor, C. Nieto-Granda, and H. I. Christensen, “Slam with expectation maximization for moveable object tracking,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.   IEEE, 2010, pp. 2077–2082.
  65. V. Indelman, E. Nelson, N. Michael, and F. Dellaert, “Multi-robot pose graph localization and data association from unknown initial relative poses via expectation maximization,” in 2014 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2014, pp. 593–600.
  66. S. Bowman, N. Atanasov, K. Daniilidis, and G. Pappas, “Probabilistic data association for semantic SLAM,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2017, pp. 1722–1729.
  67. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
  68. Y. Xiang, R. Mottaghi, and S. Savarese, “Beyond pascal: A benchmark for 3d object detection in the wild,” in IEEE Winter Conf. on Appl. of Computer Vision.   IEEE, 2014, pp. 75–82.
  69. N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4040–4048.
  70. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
  71. R. Hartley, J. Trumpf, Y. Dai, and H. Li, “Rotation averaging,” IJCV, vol. 103, no. 3, pp. 267–305, 2013.
  72. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1912–1920.
  73. M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3061–3070.
Citations (11)

Summary

  • The paper introduces an EM-based method that recovers movements of multiple objects without needing prior knowledge of their count.
  • The methodology leverages iterative clustering and expectation-maximization to refine object poses amid noise and clutter.
  • Empirical evaluations show superior accuracy in dynamic, real-world scenarios compared to conventional registration techniques.

Multi-Model 3D Registration Through Expectation-Maximization

Introduction to Multi-Model 3D Registration

The problem of 3D registration is pivotal in the fields of robotics and computer vision, playing a crucial role in applications such as motion estimation, object pose estimation, and medical imaging. Traditionally, 3D registration problems have focused on identifying the rotation and translation between two point clouds to reconstruct a single pose. This research, however, ventures into the more complex territory of multi-model 3D registration. Here, the objective is to discern the movement of multiple objects between two point clouds that may also include points from the background. This variant not only generalizes the standard 3D registration problem but also aligns with practical scenarios, such as estimating the motion of dynamic objects perceived by a robot's depth sensor in a cluttered environment.

Robust 3D Registration

The paper introduces a robust approach to multi-model 3D registration, acknowledging that real-world applications involve measurements contaminated with outliers. To address this, the authors propose an Expectation-Maximization (EM) based method capable of handling putative matches between point clouds that contain significant outliers. The approach iteratively computes the assignments of measurements to objects, establishing a practical framework for reconstructing the pose of each object without the need for prior knowledge of the number of objects present.

Expectation-Maximization Approach

The introduced EM algorithm iteratively refines the assignments of points to objects and the pose of each object. This process begins with an initial guess of object clusters and alternates between expectation and maximization steps to update cluster assignments and compute the pose and variance for each cluster. A critical insight provided is that good initial clustering, achievable through simple Euclidean clustering or advanced techniques like SegmentAnything (SAM), is pivotal for the convergence of the EM algorithm towards the ground truth clusters.

Theoretical Analysis and Practical Implications

A novel theoretical analysis assures that the EM scheme, under specific conditions related to the initial clustering, can indeed recover the true motion of all objects of interest. This promise of convergence towards ground truth underlines the method's effectiveness and lays down a theoretical foundation for further exploration.

Empirical Evaluation

The approach is rigorously evaluated against state-of-the-art methods across various datasets ranging from synthetic table-top scenes to real-life self-driving scenarios. The results demonstrate notable effectiveness, particularly in scenarios complicated by noise or identical motion patterns among different objects. The method shows superior performance in terms of accuracy in object pose estimation and object clustering as compared to traditional methods like Sequential RANSAC and T-Linkage.

Future Directions in AI and Robotics

This work opens up several avenues for future research. The successful application of the EM algorithm in multi-model 3D registration could inspire similar approaches in other complex registration problems. Moreover, the integration of learning-based methods for initial clustering or the use of more sophisticated models for handling outlier correspondences presents potential areas for enhancement. This research not only marks a significant step forward in the domain of 3D registration but also sets a robust foundation for tackling dynamic scene understanding and reconstruction in robotics and computer vision.

In summary, this work addresses the critical challenge of multi-model 3D registration with an EM-based approach, fortified by rigorous theoretical analysis and compelling empirical evidence. The implications for practical robotics applications are profound, paving the way for more accurate and robust methods of understanding and interacting with dynamic environments.

X Twitter Logo Streamline Icon: https://streamlinehq.com