Papers
Topics
Authors
Recent
Search
2000 character limit reached

3D Semantic MapNet: Building Maps for Multi-Object Re-Identification in 3D

Published 19 Mar 2024 in cs.CV | (2403.13190v1)

Abstract: We study the task of 3D multi-object re-identification from embodied tours. Specifically, an agent is given two tours of an environment (e.g. an apartment) under two different layouts (e.g. arrangements of furniture). Its task is to detect and re-identify objects in 3D - e.g. a "sofa" moved from location A to B, a new "chair" in the second layout at location C, or a "lamp" from location D in the first layout missing in the second. To support this task, we create an automated infrastructure to generate paired egocentric tours of initial/modified layouts in the Habitat simulator using Matterport3D scenes, YCB and Google-scanned objects. We present 3D Semantic MapNet (3D-SMNet) - a two-stage re-identification model consisting of (1) a 3D object detector that operates on RGB-D videos with known pose, and (2) a differentiable object matching module that solves correspondence estimation between two sets of 3D bounding boxes. Overall, 3D-SMNet builds object-based maps of each layout and then uses a differentiable matcher to re-identify objects across the tours. After training 3D-SMNet on our generated episodes, we demonstrate zero-shot transfer to real-world rearrangement scenarios by instantiating our task in Replica, Active Vision, and RIO environments depicting rearrangements. On all datasets, we find 3D-SMNet outperforms competitive baselines. Further, we show jointly training on real and generated episodes can lead to significant improvements over training on real data alone.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Multi-object tracking cascade with multi-step data association and occlusion handling. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–6. IEEE, 2018.
  2. A dataset for developing and benchmarking active vision. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 1378–1385. IEEE, 2017.
  3. Active vision dataset benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 2046–2049, 2018.
  4. Learning local feature descriptors with triplets and shallow convolutional neural networks. In BMVC, 2016.
  5. Sarc3d: a new 3d body model for people tracking and re-identification. In International conference on image analysis and processing, pages 197–206. Springer, 2011.
  6. Rearrangement: A challenge for embodied ai. arXiv preprint arXiv:2011.01975, 2020.
  7. Tracking without bells and whistles. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 941–951, 2019.
  8. The ycb object and model set: Towards common benchmarks for manipulation research. In 2015 international conference on advanced robotics (ICAR), pages 510–517. IEEE, 2015.
  9. Matterport3d: Learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158, 2017.
  10. Semantic curiosity for active visual learning. In European Conference on Computer Vision, pages 309–326. Springer, 2020.
  11. Learning 3d shape feature for texture-insensitive person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8146–8155, 2021.
  12. Probabilistic 3d multi-object tracking for autonomous driving. arXiv preprint arXiv:2001.05673, 2020.
  13. Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26:2292–2300, 2013.
  14. Object rearrangement using learned implicit collision functions. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 6010–6017. IEEE, 2021.
  15. The robotic vision scene understanding challenge. arXiv preprint arXiv:2009.05246, 2020.
  16. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
  17. Transreid: Transformer-based object re-identification. arXiv preprint arXiv:2102.04378, 2021.
  18. Large-scale multi-object rearrangement. In 2019 International Conference on Robotics and Automation (ICRA), pages 211–218. IEEE, 2019.
  19. Su V Huynh. A strong baseline for vehicle re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4147–4154, 2021.
  20. Odam: Object detection, association, and mapping using posed rgb video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5998–6008, 2021.
  21. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
  22. James Munkres. Algorithms for the assignment and transportation problems. Journal of the society for industrial and applied mathematics, 5(1):32–38, 1957.
  23. Flipreid: Closing the gap between training and inference in person re-identification. arXiv preprint arXiv:2105.05639, 2021.
  24. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413, 2017.
  25. Deep hough voting for 3d object detection in point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9277–9286, 2019.
  26. Google Research. Google-scanned-objects, 2020.
  27. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
  28. Habitat: A Platform for Embodied AI Research. In ICCV, 2019.
  29. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  30. Person re-identification with a locally aware transformer. arXiv preprint arXiv:2106.03720, 2021.
  31. Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2):343–348, 1967.
  32. The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
  33. Habitat 2.0: Training home assistants to rearrange their habitat. In NeurIPS, 2021.
  34. Multiple people tracking by lifted multicut and person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3539–3548, 2017.
  35. Rio: 3d object instance re-localization in changing indoor environments. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7658–7667, 2019.
  36. A survey of vehicle re-identification based on deep learning. IEEE Access, 7:172443–172469, 2019.
  37. Shape and appearance context modeling. In 2007 ieee 11th international conference on computer vision, pages 1–8. Ieee, 2007.
  38. 3d multi-object tracking: A baseline and new evaluation metrics. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10359–10366. IEEE, 2020.
  39. On the unreasonable effectiveness of centroids in image retrieval. arXiv preprint arXiv:2104.13643, 2021.
  40. Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  41. Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11784–11793, 2021.
  42. Scalable person re-identification: A benchmark. In Proceedings of the IEEE international conference on computer vision, pages 1116–1124, 2015.
  43. Person re-identification in the 3d space. arXiv preprint arXiv:2006.04569, 2020.
  44. Vehiclenet: Learning robust visual representation for vehicle re-identification. IEEE Transactions on Multimedia, 2020.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.