Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes (2403.02769v2)

Published 5 Mar 2024 in cs.CV

Abstract: Human-centric 3D scene understanding has recently drawn increasing attention, driven by its critical impact on robotics. However, human-centric real-life scenarios are extremely diverse and complicated, and humans have intricate motions and interactions. With limited labeled data, supervised methods are difficult to generalize to general scenarios, hindering real-life applications. Mimicking human intelligence, we propose an unsupervised 3D detection method for human-centric scenarios by transferring the knowledge from synthetic human instances to real scenes. To bridge the gap between the distinct data representations and feature distributions of synthetic models and real point clouds, we introduce novel modules for effective instance-to-scene representation transfer and synthetic-to-real feature alignment. Remarkably, our method exhibits superior performance compared to current state-of-the-art techniques, achieving 87.8% improvement in mAP and closely approaching the performance of fully supervised methods (62.15 mAP vs. 69.02 mAP) on HuCenLife Dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9297–9307, 2019.
  2. Fast range image-based segmentation of sparse 3d laser scans for online operation. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 163–169, 2016.
  3. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
  4. Towards 3d scene understanding by referring synthetic models. arXiv preprint arXiv:2203.10546, 2022.
  5. Towards label-free scene understanding by vision foundation models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023a.
  6. Clip2scene: Towards label-efficient 3d scene understanding by clip. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7020–7030, 2023b.
  7. Bridging language and geometric primitives for zero-shot point cloud segmentation. In Proceedings of the 31st ACM International Conference on Multimedia, pages 5380–5388, 2023c.
  8. Model2scene: Learning 3d scene representation via contrastive language-cad models pre-training. arXiv preprint arXiv:2309.16956, 2023d.
  9. Input-output balanced framework for long-tailed lidar semantic segmentation. In 2021 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2021.
  10. Stcrowd: A multimodal dataset for pedestrian perception in crowded scenes. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19576–19585, 2022.
  11. Weakly supervised 3d multi-person pose estimation for large-scale scenes based on monocular camera and single lidar. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 461–469, 2023.
  12. Hsc4d: Human-centered 4d scene capture in large-scale indoor-outdoor space using wearable imus and lidar. CVPR, pages 6782–6792, 2022.
  13. A density-based algorithm for discovering clusters in large spatial databases with noise. In Knowledge Discovery and Data Mining, 1996.
  14. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.
  15. Min-cut based segmentation of point clouds. 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pages 39–46, 2009.
  16. Pointpillars: Fast encoders for object detection from point clouds. In CVPR, pages 12697–12705, 2019.
  17. Jialian Li and etc. Lidarcap: Long-range marker-less 3d human motion capture with lidar point clouds. In CVPR, pages 20502–20512, 2022.
  18. Segment any point cloud sequences by distilling vision foundation models. arXiv preprint arXiv:2306.09347, 2023.
  19. Smpl: A skinned multi-person linear model. Seminal Graphics Papers: Pushing the Boundaries, Volume 2, 2015.
  20. Smpl: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 851–866. 2023.
  21. See more and know more: Zero-shot point cloud segmentation via multi-modal visual data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21674–21684, 2023.
  22. Trafficpredict: Trajectory prediction for heterogeneous traffic-agents. In AAAI, pages 6120–6127, 2019.
  23. Motion inspired unsupervised perception and prediction in autonomous driving. In European Conference on Computer Vision, 2022.
  24. Unsupervised 3d perception with 2d vision-language distillation for autonomous driving. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 8568–8578, 2023.
  25. Openscene: 3d scene understanding with open vocabularies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 815–824, 2023a.
  26. Sam-guided unsupervised domain adaptation for 3d segmentation. arXiv preprint arXiv:2310.08820, 2023b.
  27. Cl3d: Unsupervised domain adaptation for cross-lidar 3d detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2047–2055, 2023c.
  28. Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:128–140, 2017.
  29. Randomrooms: Unsupervised pre-training from synthetic shapes and randomized layouts for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3283–3292, 2021.
  30. Lidar-aid inertial poser: Large-scale human motion capture by sparse inertial and lidar sensors. IEEE Transactions on Visualization and Computer Graphics, 2023.
  31. Normalized cuts and image segmentation. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 731–737, 1997.
  32. Scalability in perception for autonomous driving: Waymo open dataset. In CVPR, pages 2446–2454, 2020.
  33. Unsupervised object detection with lidar clues. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5958–5968, 2020.
  34. Learning from synthetic humans. In CVPR, 2017.
  35. Self-supervised transformers for unsupervised object discovery using normalized cut. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14523–14533, 2022a.
  36. 4d unsupervised object discovery. ArXiv, abs/2210.04801, 2022b.
  37. 3d multi-object tracking: A baseline and new evaluation metrics. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10359–10366. IEEE, 2020.
  38. Squeezesegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In 2019 International Conference on Robotics and Automation (ICRA), pages 4376–4382. IEEE, 2019.
  39. Pointcontrast: Unsupervised pre-training for 3d point cloud understanding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pages 574–591. Springer, 2020.
  40. Hypermodest: Self-supervised 3d object detection with confidence score filtering. arXiv preprint arXiv:2304.14446, 2023.
  41. Human-centric scene understanding for 3d large-scale scenarios. ArXiv, abs/2307.14392, 2023.
  42. Second: Sparsely embedded convolutional detection. Sensors, 18(10):3337, 2018.
  43. Proposalcontrast: Unsupervised pre-training for lidar-based 3d object detection. In European Conference on Computer Vision, pages 17–33. Springer, 2022.
  44. Center-based 3d object detection and tracking. In CVPR, pages 11784–11793, 2021.
  45. Learning to detect mobile objects from lidar scans without labels. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1120–1130, 2022.
  46. Towards unsupervised object detection from lidar point clouds. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9317–9328, 2023a.
  47. Growsp: Unsupervised semantic segmentation of 3d point clouds. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17619–17629, 2023b.
  48. Voxelnet: End-to-end learning for point cloud based 3d object detection. In CVPR, pages 4490–4499. IEEE Computer Society, 2018.
  49. Ssn: Shape signature networks for multi-class object detection from point clouds. In ECCV, pages 581–597. Springer, 2020.
  50. Cylindrical and asymmetrical 3d convolution networks for lidar-based perception. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6807–6822, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yichen Yao (12 papers)
  2. Zimo Jiang (1 paper)
  3. Yujing Sun (21 papers)
  4. Zhencai Zhu (7 papers)
  5. Xinge Zhu (62 papers)
  6. Runnan Chen (32 papers)
  7. Yuexin Ma (97 papers)
Citations (3)