Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 83 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 197 tok/s Pro
2000 character limit reached

WHU-Synthetic: A Synthetic Perception Dataset for 3-D Multitask Model Research (2402.19059v3)

Published 29 Feb 2024 in cs.CV

Abstract: End-to-end models capable of handling multiple sub-tasks in parallel have become a new trend, thereby presenting significant challenges and opportunities for the integration of multiple tasks within the domain of 3D vision. The limitations of 3D data acquisition conditions have not only restricted the exploration of many innovative research problems but have also caused existing 3D datasets to predominantly focus on single tasks. This has resulted in a lack of systematic approaches and theoretical frameworks for 3D multi-task learning, with most efforts merely serving as auxiliary support to the primary task. In this paper, we introduce WHU-Synthetic, a large-scale 3D synthetic perception dataset designed for multi-task learning, from the initial data augmentation (upsampling and depth completion), through scene understanding (segmentation), to macro-level tasks (place recognition and 3D reconstruction). Collected in the same environmental domain, we ensure inherent alignment across sub-tasks to construct multi-task models without separate training methods. Besides, we implement several novel settings, making it possible to realize certain ideas that are difficult to achieve in real-world scenarios. This supports more adaptive and robust multi-task perception tasks, such as sampling on city-level models, providing point clouds with different densities, and simulating temporal changes. Using our dataset, we conduct several experiments to investigate mutual benefits between sub-tasks, revealing new observations, challenges, and opportunities for future research. The dataset is accessible at https://github.com/WHU-USI3DV/WHU-Synthetic.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Papers with code. https://paperswithcode.com/.
  2. Pu-dense: Sparse tensor-based point cloud geometry upsampling. IEEE Transactions on Image Processing, 31:4133–4148, 2022.
  3. Joint 2D-3D-Semantic Data for Indoor Scene Understanding. ArXiv e-prints, 2017.
  4. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proc. of the IEEE/CVF International Conf. on Computer Vision (ICCV), 2019.
  5. Faust: Dataset and evaluation for 3d mesh registration. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3794–3801, 2014.
  6. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  7. Semantic segmentation on swiss3dcities: A benchmark study on aerial photogrammetric 3d pointcloud dataset. Pattern Recognition Letters, 150:108–114, 2021.
  8. University of Michigan North Campus long-term vision and lidar dataset. International Journal of Robotics Research, 35(9):1023–1035, 2015.
  9. ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago, 2015.
  10. Stpls3d: A large-scale synthetic and real aerial photogrammetry 3d point cloud dataset. In 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022. BMVA Press, 2022.
  11. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3075–3084, 2019.
  12. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  13. Jean-Emmanuel Deschaud. KITTI-CARLA: a KITTI-like dataset generated by CARLA Simulator. arXiv e-prints, art. arXiv:2109.00892, 2021.
  14. Paris-carla-3d: A real and synthetic outdoor point cloud dataset for challenging tasks in 3d mapping. Remote Sensing, 13(22), 2021.
  15. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, pages 1–16, 2017.
  16. Efficiently identifying task groupings for multi-task learning. Advances in Neural Information Processing Systems, 34:27503–27516, 2021.
  17. Vision meets robotics: The kitti dataset. International Journal of Robotics Research (IJRR), 2013.
  18. SynthCity: A large scale synthetic point cloud. In ArXiv preprint, 2019.
  19. SEMANTIC3D.NET: A new large-scale point cloud classification benchmark. In ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, pages 91–98, 2017.
  20. W Keith Hastings. Monte carlo sampling methods using markov chains and their applications. 1970.
  21. Urbannav: An open-sourced multisensory dataset for benchmarking positioning algorithms designed for urban areas. In Proceedings of the 34th International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2021), pages 226–256, 2021.
  22. Squeeze-and-excitation networks. 2018.
  23. Deep depth completion from extremely sparse data: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  24. Towards precise and efficient image guided depth completion. 2021a.
  25. Randla-net: Efficient semantic segmentation of large-scale point clouds. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  26. Towards semantic segmentation of urban-scale 3d point clouds: A dataset, benchmarks and challenges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021b.
  27. Planning-oriented autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17853–17862, 2023.
  28. Pyramid point cloud transformer for large-scale place recognition. In ICCV, 2021.
  29. Mulran: Multimodal range dataset for urban place recognition. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, 2020.
  30. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  31. Segment anything. arXiv:2304.02643, 2023.
  32. Jacek Komorowski. Improving point cloud based place recognition with ranking-based loss and large batch training. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 3699–3705, 2022.
  33. Voicebox: Text-guided multilingual universal speech generation at scale. arXiv preprint arXiv:2306.15687, 2023.
  34. Pu-gan: a point cloud upsampling adversarial network. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7203–7212, 2019.
  35. Campus3d: A photogrammetry point cloud benchmark for hierarchical understanding of outdoor scene. In Proceedings of the 28th ACM International Conference on Multimedia, pages 238–246, 2020.
  36. An exponential learning rate schedule for deep learning. arXiv preprint arXiv:1910.07454, 2019.
  37. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  38. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
  39. Pc2-pu: Patch correlation and point correlation for effective point cloud upsampling. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2191–2201, 2022.
  40. 1 Year, 1000km: The Oxford RobotCar Dataset. The International Journal of Robotics Research (IJRR), 36(1):3–15, 2017.
  41. ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras. IEEE Transactions on Robotics, 33(5):1255–1262, 2017.
  42. OpenAI. Gpt-4 technical report, 2023.
  43. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  44. Mulls: Versatile lidar slam via multi-metric linear least square. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 11633–11640. IEEE, 2021.
  45. Non-local spatial propagation network for depth completion. In Proc. of European Conference on Computer Vision (ECCV), 2020.
  46. Pu-gcn: Point cloud upsampling using graph convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11683–11692, 2021.
  47. Pugeo-net: A geometry-centric network for 3d point cloud upsampling. In European conference on computer vision, pages 752–769. Springer, 2020.
  48. Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  49. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  50. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  51. Paris-lille-3d: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification. The International Journal of Robotics Research, 37(6):545–557, 2018.
  52. Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
  53. Paris-rue-madame database: a 3d mobile laser scanner dataset for benchmarking urban detection, segmentation and classification methods. In 4th international conference on pattern recognition, applications and methods ICPRAM 2014, 2014.
  54. Dales objects: A large scale benchmark dataset for instance segmentation in aerial lidar. IEEE Access, pages 1–1, 2021.
  55. Which tasks should be learned together in multi-task learning? In International Conference on Machine Learning, pages 9120–9132. PMLR, 2020.
  56. SHIFT: a synthetic driving dataset for continuous multi-task domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21371–21382, 2022.
  57. Kpconv: Flexible and deformable convolution for point clouds. Proceedings of the IEEE International Conference on Computer Vision, 2019.
  58. Sparsity invariant cnns. In International Conference on 3D Vision (3DV), 2017.
  59. Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  60. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In International Conference on Computer Vision (ICCV), 2019.
  61. A brief review of deep multi-task learning and auxiliary task learning. arXiv preprint arXiv:2007.01126, 2020.
  62. Terramobilita/iqmulus urban point cloud analysis benchmark. Computers & Graphics, 49:126–133, 2015.
  63. KISS-ICP: In Defense of Point-to-Point ICP – Simple, Accurate, and Robust Registration If Done the Right Way. IEEE Robotics and Automation Letters (RA-L), 8(2):1029–1036, 2023.
  64. All-In-One Drive: A Large-Scale Comprehensive Perception Dataset with High-Density Long-Range Point Clouds. arXiv, 2021.
  65. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015.
  66. Transfer learning from synthetic to real lidar point cloud for semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2795–2803, 2022.
  67. Lasdu: A large-scale aerial lidar dataset for semantic labeling in dense urban areas. ISPRS International Journal of Geo-Information, 9(7):450, 2020.
  68. Ec-net: an edge-aware point set consolidation network. In Proceedings of the European conference on computer vision (ECCV), pages 386–402, 2018a.
  69. Pu-net: Point cloud upsampling network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2790–2799, 2018b.
  70. Grayscale and normal guided depth completion with a low-cost lidar. In 2021 IEEE International Conference on Image Processing (ICIP), pages 979–983. IEEE, 2021.
  71. Loam: Lidar odometry and mapping in real-time. In Robotics: Science and systems, pages 1–9. Berkeley, CA, 2014.
  72. An overview of multi-task learning. National Science Review, 5(1):30–43, 2018.
  73. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  74. A surface geometry model for lidar depth completion. IEEE Robotics and Automation Letters, 6(3):4457–4464, 2021.
  75. Dublincity: Annotated lidar point cloud and its applications. arXiv preprint arXiv:1909.03613, 2019.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.