Multi-Model 3D Registration: Finding Multiple Moving Objects in Cluttered Point Clouds (2402.10865v1)
Abstract: We investigate a variation of the 3D registration problem, named multi-model 3D registration. In the multi-model registration problem, we are given two point clouds picturing a set of objects at different poses (and possibly including points belonging to the background) and we want to simultaneously reconstruct how all objects moved between the two point clouds. This setup generalizes standard 3D registration where one wants to reconstruct a single pose, e.g., the motion of the sensor picturing a static scene. Moreover, it provides a mathematically grounded formulation for relevant robotics applications, e.g., where a depth sensor onboard a robot perceives a dynamic scene and has the goal of estimating its own motion (from the static portion of the scene) while simultaneously recovering the motion of all dynamic objects. We assume a correspondence-based setup where we have putative matches between the two point clouds and consider the practical case where these correspondences are plagued with outliers. We then propose a simple approach based on Expectation-Maximization (EM) and establish theoretical conditions under which the EM approach converges to the ground truth. We evaluate the approach in simulated and real datasets ranging from table-top scenes to self-driving scenarios and demonstrate its effectiveness when combined with state-of-the-art scene flow methods to establish dense correspondences.
- P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” Intl. J. of Robotics Research, vol. 31, no. 5, pp. 647–663, 2012.
- G. Blais and M. D. Levine, “Registering multiview range data to create 3d computer objects,” IEEE Trans. Pattern Anal. Machine Intell., vol. 17, no. 8, pp. 820–824, 1995.
- S. Choi, Q. Y. Zhou, and V. Koltun, “Robust reconstruction of indoor scenes,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5556–5565.
- B. Drost, M. Ulrich, N. Navab, and S. Ilic, “Model globally, match locally: Efficient and robust 3D object recognition,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 998–1005.
- J. M. Wong, V. Kee, T. Le, S. Wagner, G. L. Mariottini, A. Schneider, L. Hamilton, R. Chipalkatty, M. Hebert, D. M. S. Johnson et al., “Segicp: Integrated deep semantic segmentation and pose estimation,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 5784–5789.
- A. Zeng, K. T. Yu, S. Song, D. Suo, E. Walker, A. Rodriguez, and J. Xiao, “Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge,” in IEEE Intl. Conf. on Robotics and Automation (ICRA). IEEE, 2017, pp. 1386–1383.
- M. A. Audette, F. P. Ferrie, and T. M. Peters, “An algorithmic overview of surface registration techniques for medical imaging,” Med. Image Anal., vol. 4, no. 3, pp. 201–217, 2000.
- G. K. L. Tam, Z. Q. Cheng, Y. K. Lai, F. C. Langbein, Y. Liu, D. Marshall, R. R. Martin, X. F. Sun, and P. L. Rosin, “Registration of 3d point clouds and meshes: a survey from rigid to nonrigid.” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 7, pp. 1199–1217, 2013.
- J. Bazin, Y. Seo, R. Hartley, and M. Pollefeys, “Globally optimal inlier set maximization with unknown rotation and focal length,” in European Conf. on Computer Vision (ECCV), 2014, pp. 803–817.
- G. Wahba, “A least squares estimate of satellite attitude,” SIAM review, vol. 7, no. 3, pp. 409–409, 1965.
- K. Arun, T. Huang, and S. Blostein, “Least-squares fitting of two 3-D point sets,” IEEE Trans. Pattern Anal. Machine Intell., vol. 9, no. 5, pp. 698–700, sept. 1987.
- B. K. P. Horn, “Closed-form solution of absolute orientation using unit quaternions,” J. Opt. Soc. Amer., vol. 4, no. 4, pp. 629–642, Apr 1987.
- H. Yang, J. Shi, and L. Carlone, “TEASER: Fast and Certifiable Point Cloud Registration,” IEEE Trans. Robotics, vol. 37, no. 2, pp. 314–333, 2020, extended arXiv version 2001.07715 (pdf).
- J. L. Barron, D. J. Fleet, and S. S. Beuachemin, “Performance of optical flow techniques,” Intl. J. of Computer Vision, vol. 12, no. 1, pp. 43–77, 1994.
- S. Vedula, P. Rander, R. Collins, and T. Kanade, “Three-dimensional scene flow,” IEEE Trans. Pattern Anal. Machine Intell., vol. 27, no. 3, pp. 475–480, 2005.
- L. Peng, C. Kümmerle, and R. Vidal, “On the convergence of irls and its variants in outlier-robust estimation,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 17 808–17 818.
- A. Barik and J. Honorio, “Outlier-robust estimation of a sparse linear model using invexity,” 2023.
- L. Carlone, “Estimation contracts for outlier-robust geometric perception,” Foundations and Trends (FnT) in Robotics, arXiv preprint: 2208.10521, 2023, (pdf).
- K. M. Tavish and T. D. Barfoot, “At all costs: A comparison of robust cost functions for camera correspondence outliers,” in Conf. Computer and Robot Vision. IEEE, 2015, pp. 62–69.
- M. J. Black and A. Rangarajan, “On the unification of line processes, outlier rejection, and robust statistics with applications in early vision,” Intl. J. of Computer Vision, vol. 19, no. 1, pp. 57–91, 1996.
- H. Yang, P. Antonante, V. Tzoumas, and L. Carlone, “Graduated non-convexity for robust spatial perception: From non-minimal solvers to global outlier rejection,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1127–1134, 2020, arXiv preprint:1909.08605 (with supplemental material), (pdf).
- H. Yang and L. Carlone, “Certifiably optimal outlier-robust geometric perception: Semidefinite relaxations and scalable global optimization,” IEEE Trans. Pattern Anal. Machine Intell., 2022, (pdf).
- T.-J. Chin, Z. Cai, and F. Neumann, “Robust fitting in computer vision: Easy or hard?” in European Conf. on Computer Vision (ECCV), 2018.
- P. Antonante, V. Tzoumas, H. Yang, and L. Carlone, “Outlier-robust estimation: Hardness, minimally tuned algorithms, and applications,” IEEE Trans. Robotics, vol. 38, no. 1, pp. 281–301, 2021, (pdf).
- M. Fischler and R. Bolles, “Random sample consensus: a paradigm for model fitting with application to image analysis and automated cartography,” Commun. ACM, vol. 24, pp. 381–395, 1981.
- J. Shi, H. Yang, and L. Carlone, “ROBIN: a graph-theoretic approach to reject outliers in robust estimation using invariants,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2021, arXiv preprint: 2011.03659, (pdf).
- A. P. Bustos, T.-J. Chin, F. Neumann, T. Friedrich, and M. Katzmann, “A practical maximum clique algorithm for matching with pairwise constraints,” arXiv preprint arXiv:1902.01534, 2019.
- O. Enqvist, K. Josephson, and F. Kahl, “Optimal correspondences from pairwise constraints,” in Intl. Conf. on Computer Vision (ICCV), 2009, pp. 1295–1302.
- M. Bosse, G. Agamennoni, and I. Gilitschenski, “Robust estimation and applications in robotics,” Foundations and Trends in Robotics, vol. 4, no. 4, pp. 225–269, 2016.
- M. Charikar, J. Steinhardt, and G. Valiant, “Learning from untrusted data,” in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, ser. STOC 2017, 2017, pp. 47–60.
- S. Karmalkar, A. Klivans, and P. Kothari, “List-decodable linear regression,” in Advances in Neural Information Processing Systems (NIPS), vol. 32, 2019.
- P. Raghavendra and M. Yau, “List decodable learning via sum of squares,” in Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA ’20, 2020, p. 161–180.
- I. Diakonikolas, D. Kane, and D. Kongsgaard, “List-decodable mean estimation via iterative multi-filtering,” Advances in Neural Information Processing Systems, vol. 33, pp. 9312–9323, 2020.
- Y. Cherapanamjeri, S. Mohanty, and M. Yau, “List decodable mean estimation in nearly linear time,” in 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2020, pp. 141–148.
- L. Magri and A. Fusiello, “T-linkage: A continuous relaxation of j-linkage for multi-model fitting,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3954–3961.
- R. Toldo and A. Fusiello, “Robust multiple structures estimation with j-linkage,” in Computer Vision – ECCV 2008, D. Forsyth, P. Torr, and A. Zisserman, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 537–547.
- T.-J. Chin, H. Wang, and D. Suter, “Robust fitting of multiple structures: The statistical learning approach,” in 2009 IEEE 12th International Conference on Computer Vision, 2009, pp. 413–420.
- T.-J. Chin, D. Suter, and H. Wang, “Multi-structure model selection via kernel optimisation,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 3586–3593.
- L. Magri and A. Fusiello, “Multiple structure recovery via robust preference analysis,” Image and Vision Computing, vol. 67, pp. 1–15, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S026288561730152X
- M. Tepper and G. Sapiro, “Nonnegative matrix underapproximation for robust multiple model fitting,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 655–663.
- S. Lin, G. Xiao, Y. Yan, D. Suter, and H. Wang, “Hypergraph optimization for multi-structural geometric model fitting,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 8730–8737, Jul. 2019. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/4897
- P. Purkait, T.-J. Chin, A. Sadri, and D. Suter, “Clustering with hypergraphs: The case for large hyperedges,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 9, pp. 1697–1711, 2017.
- P. H. Torr, “Geometric motion segmentation and model selection,” Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, vol. 356, no. 1740, pp. 1321–1340, 1998.
- M. Zuliani, C. Kenney, and B. Manjunath, “The multiransac algorithm and its application to detect planar homographies,” in IEEE International Conference on Image Processing 2005, vol. 3, 2005, pp. III–153.
- L. Magri and A. Fusiello, “Multiple models fitting as a set coverage problem,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3318–3326.
- H. Isack and Y. Boykov, “Energy-based geometric multi-model fitting,” International Journal of Computer Vision, vol. 97, pp. 123–147, 04 2012.
- D. Baráth and J. Matas, “Progressive-x: Efficient, anytime, multi-model fitting algorithm,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 3779–3787.
- X. Yi, C. Caramanis, and S. Sanghavi, “Alternating minimization for mixed linear regression,” in International Conference on Machine Learning. PMLR, 2014, pp. 613–621.
- H. Sedghi, M. Janzamin, and A. Anandkumar, “Provable tensor methods for learning mixtures of generalized linear models,” in Artificial Intelligence and Statistics. PMLR, 2016, pp. 1223–1231.
- Y. Li and Y. Liang, “Learning mixtures of linear regressions with nearly optimal complexity,” in Conference On Learning Theory. PMLR, 2018, pp. 1125–1144.
- X. Yi, C. Caramanis, and S. Sanghavi, “Solving a mixture of many random linear equations by tensor decomposition and alternating minimization,” CoRR, vol. abs/1608.05749, 2016. [Online]. Available: http://arxiv.org/abs/1608.05749
- S. Faria and G. Soromenho, “Fitting mixtures of linear regressions,” Journal of Statistical Computation and Simulation, vol. 80, no. 2, pp. 201–225, 2010. [Online]. Available: https://doi.org/10.1080/00949650802590261
- J. M. Klusowski, D. Yang, and W. D. Brinda, “Estimating the coefficients of a mixture of two linear regressions by expectation maximization,” IEEE Transactions on Information Theory, vol. 65, no. 6, pp. 3515–3524, 2019.
- J. Kwon and C. Caramanis, “Em converges for a mixture of many linear regressions,” in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, S. Chiappa and R. Calandra, Eds., vol. 108. PMLR, 26–28 Aug 2020, pp. 1727–1736. [Online]. Available: https://proceedings.mlr.press/v108/kwon20a.html
- Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision – ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds. Cham: Springer International Publishing, 2020, pp. 402–419.
- E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” CoRR, vol. abs/1612.01925, 2016. [Online]. Available: http://arxiv.org/abs/1612.01925
- W.-C. Ma, S. Wang, R. Hu, Y. Xiong, and R. Urtasun, “Deep rigid instance scene flow,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3609–3617.
- G. Yang and D. Ramanan, “Learning to segment rigid motions from two frames,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 1266–1275.
- Z. Teed and J. Deng, “Raft-3d: Scene flow using rigid-motion embeddings,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8371–8380.
- H. Liu, T. Lu, Y. Xu, J. Liu, W. Li, and L. Chen, “Camliflow: Bidirectional camera-lidar fusion for joint optical flow and scene flow estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5791–5801.
- H. Liu, T. Lu, Y. Xu, J. Liu, and L. Wang, “Learning optical flow and scene flow with bidirectional camera-lidar fusion,” 2023.
- T. K. Moon, “The expectation-maximization algorithm,” Signal processing magazine, IEEE, vol. 13, no. 6, pp. 47–60, 1996.
- B. Eckart, K. Kim, and J. Kautz, “Hgmr: Hierarchical gaussian mixtures for adaptive 3d registration,” in Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV. Berlin, Heidelberg: Springer-Verlag, 2018, p. 730–746. [Online]. Available: https://doi.org/10.1007/978-3-030-01267-0_43
- J. G. Rogers, A. J. Trevor, C. Nieto-Granda, and H. I. Christensen, “Slam with expectation maximization for moveable object tracking,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2010, pp. 2077–2082.
- V. Indelman, E. Nelson, N. Michael, and F. Dellaert, “Multi-robot pose graph localization and data association from unknown initial relative poses via expectation maximization,” in 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 593–600.
- S. Bowman, N. Atanasov, K. Daniilidis, and G. Pappas, “Probabilistic data association for semantic SLAM,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2017, pp. 1722–1729.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
- Y. Xiang, R. Mottaghi, and S. Savarese, “Beyond pascal: A benchmark for 3d object detection in the wild,” in IEEE Winter Conf. on Appl. of Computer Vision. IEEE, 2014, pp. 75–82.
- N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4040–4048.
- A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
- R. Hartley, J. Trumpf, Y. Dai, and H. Li, “Rotation averaging,” IJCV, vol. 103, no. 3, pp. 267–305, 2013.
- Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1912–1920.
- M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3061–3070.