Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 174 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 98 tok/s Pro
Kimi K2 190 tok/s Pro
GPT OSS 120B 443 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

RaSim: A Range-aware High-fidelity RGB-D Data Simulation Pipeline for Real-world Applications (2404.03962v1)

Published 5 Apr 2024 in cs.CV

Abstract: In robotic vision, a de-facto paradigm is to learn in simulated environments and then transfer to real-world applications, which poses an essential challenge in bridging the sim-to-real domain gap. While mainstream works tackle this problem in the RGB domain, we focus on depth data synthesis and develop a range-aware RGB-D data simulation pipeline (RaSim). In particular, high-fidelity depth data is generated by imitating the imaging principle of real-world sensors. A range-aware rendering strategy is further introduced to enrich data diversity. Extensive experiments show that models trained with RaSim can be directly applied to real-world scenarios without any finetuning and excel at downstream RGB-D perception tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. H. Liu, J. Zhang, K. Yang, X. Hu, and R. Stiefelhagen, “CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers,” arXiv preprint arXiv:2203.04838, 2022.
  2. L.-Z. Chen, Z. Lin, Z. Wang, Y.-L. Yang, and M.-M. Cheng, “Spatial information guided convolution for real-time rgbd semantic segmentation,” IEEE Transactions on Image Processing, vol. 30, pp. 2313–2324, 2021.
  3. X. Chen, K.-Y. Lin, J. Wang, W. Wu, C. Qian, H. Li, and G. Zeng, “Bi-directional cross-modality feature propagation with separation-and-aggregation gate for rgb-d semantic segmentation,” in European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 561–577.
  4. M. Sodano, F. Magistri, T. Guadagnino, J. Behley, and C. Stachniss, “Robust double-encoder network for rgb-d panoptic segmentation,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 4953–4959.
  5. Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes,” Robotics: Science and Systems Conference (RSS), 2018.
  6. Y. Labbé, J. Carpentier, M. Aubry, and J. Sivic, “CosyPose: Consistent multi-view multi-object 6D pose estimation,” in European Conference on Computer Vision (ECCV), 2020, pp. 574–591.
  7. G. Wang, F. Manhardt, F. Tombari, and X. Ji, “GDR-Net: Geometry-guided direct regression network for monocular 6D object pose estimation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 16 611–16 621.
  8. S. Sajjan, M. Moore, M. Pan, G. Nagaraja, J. Lee, A. Zeng, and S. Song, “Clear grasp: 3d shape estimation of transparent objects for manipulation,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 3634–3642.
  9. L. Zhu, A. Mousavian, Y. Xiang, H. Mazhar, J. van Eenbergen, S. Debnath, and D. Fox, “Rgb-d local implicit function for depth completion of transparent objects,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 4649–4658.
  10. M. Hu, S. Wang, B. Li, S. Ning, L. Fan, and X. Gong, “Penet: Towards precise and efficient image guided depth completion,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 13 656–13 662.
  11. F. Hagelskjær and A. G. Buch, “Bridging the reality gap for pose estimation networks using sensor-based domain randomization,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 935–944.
  12. T. Kollar, M. Laskey, K. Stone, B. Thananjeyan, and M. Tjersland, “Simnet: Enabling robust unknown object manipulation from pure synthetic data via stereo,” in Conference on Robot Learning (CoRL).   PMLR, 2022, pp. 938–948.
  13. K. Greff, F. Belletti, L. Beyer, C. Doersch, Y. Du, D. Duckworth, D. J. Fleet, D. Gnanapragasam, F. Golemo, C. Herrmann, T. Kipf, A. Kundu, D. Lagun, I. Laradji, H.-T. D. Liu, H. Meyer, Y. Miao, D. Nowrouzezahrai, C. Oztireli, E. Pot, N. Radwan, D. Rebain, S. Sabour, M. S. M. Sajjadi, M. Sela, V. Sitzmann, A. Stone, D. Sun, S. Vora, Z. Wang, T. Wu, K. M. Yi, F. Zhong, and A. Tagliasacchi, “Kubric: a scalable dataset generator,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  14. T. Hodaň, V. Vineet, R. Gal, E. Shalev, J. Hanzelka, T. Connell, P. Urbina, S. Sinha, and B. Guenter, “Photorealistic Image Synthesis for Object Instance Detection,” IEEE International Conference on Image Processing (ICIP), 2019.
  15. J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox, and S. Birchfield, “Deep object pose estimation for semantic robotic grasping of household objects,” in Conference on Robot Learning (CoRL), 2018, pp. 306–316.
  16. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 10 012–10 022.
  17. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  18. A. Eftekhar, A. Sax, J. Malik, and A. Zamir, “Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10 786–10 796.
  19. Z. Li, T.-W. Yu, S. Sang, S. Wang, M. Song, Y. Liu, Y.-Y. Yeh, R. Zhu, N. Gundavarapu, J. Shi et al., “Openrooms: An open framework for photorealistic indoor scene datasets,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 7190–7199.
  20. Q. Dai, J. Zhang, Q. Li, T. Wu, H. Dong, Z. Liu, P. Tan, and H. Wang, “Domain randomization-enhanced depth simulation and restoration for perceiving and grasping specular and transparent objects,” in European Conference on Computer Vision (ECCV), 2022.
  21. X. Zhang, R. Chen, A. Li, F. Xiang, Y. Qin, J. Gu, Z. Ling, M. Liu, P. Zeng, S. Han et al., “Close the optical sensing domain gap by physics-grounded active stereo sensor simulation,” IEEE Transactions on Robotics (T-RO), 2023.
  22. S. James, P. Wohlhart, M. Kalakrishnan, D. Kalashnikov, A. Irpan, J. Ibarz, S. Levine, R. Hadsell, and K. Bousmalis, “Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 12 627–12 637.
  23. M. Rad, M. Oberweger, and V. Lepetit, “Feature mapping for learning fast and accurate 3d pose inference from synthetic images,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  24. X. Deng, S. Yang, Y. Zhang, P. Tan, L. Chang, and H. Wang, “Hand3d: Hand pose estimation using 3d neural network,” arXiv preprint arXiv:1704.02224, 2017.
  25. H. Wang, S. Sridhar, J. Huang, J. Valentin, S. Song, and L. J. Guibas, “Normalized object coordinate space for category-level 6D object pose and size estimation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2642–2651.
  26. S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser, “Semantic scene completion from a single depth image,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1746–1754.
  27. M. Roberts, J. Ramapuram, A. Ranjan, A. Kumar, M. A. Bautista, N. Paczan, R. Webb, and J. M. Susskind, “Hypersim: A photorealistic synthetic dataset for holistic indoor scene understanding,” in IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 10 912–10 922.
  28. A. Szot, A. Clegg, E. Undersander, E. Wijmans, Y. Zhao, J. Turner, N. Maestre, M. Mukadam, D. S. Chaplot, O. Maksymets et al., “Habitat 2.0: Training home assistants to rearrange their habitat,” Conference on Neural Information Processing Systems (NeurIPS), vol. 34, pp. 251–266, 2021.
  29. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in IEEE/RJS International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 23–30.
  30. X. Ren, J. Luo, E. Solowjow, J. A. Ojea, A. Gupta, A. Tamar, and P. Abbeel, “Domain randomization for active pose estimation,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 7228–7234.
  31. J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V. Jampani, C. Anil, T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training deep networks with synthetic data: Bridging the reality gap by domain randomization,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018, pp. 969–977.
  32. M. Jaritz, T.-H. Vu, R. de Charette, E. Wirbel, and P. Pérez, “xMUDA: Cross-modal unsupervised domain adaptation for 3D semantic segmentation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  33. J. Yang, S. Shi, Z. Wang, H. Li, and X. Qi, “St3d: Self-training for unsupervised domain adaptation on 3d object detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  34. X. Zhou, A. Karpur, C. Gan, L. Luo, and Q. Huang, “Unsupervised domain adaptation for 3d keypoint estimation via view consistency,” in European Conference on Computer Vision (ECCV), 2018, pp. 137–153.
  35. H. Hirschmüller, “Semi-global matching-motivation, developments and applications,” Photogrammetric Week 11, pp. 173–184, 2011.
  36. R. Zabih and J. Woodfill, “Non-parametric local transforms for computing visual correspondence,” in European Conference on Computer Vision (ECCV).   Springer, 1994, pp. 151–158.
  37. H. Hirschmuller, “Stereo processing by semiglobal matching and mutual information,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 30, no. 2, pp. 328–341, 2007.
  38. L. Downs, A. Francis, N. Koenig, B. Kinman, R. Hickman, K. Reymann, T. B. McHugh, and V. Vanhoucke, “Google scanned objects: A high-quality dataset of 3d scanned household items,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 2553–2560.
  39. E. Coumans and Y. Bai, “PyBullet, a python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2020.
  40. Y. Labbé, L. Manuelli, A. Mousavian, S. Tyree, S. Birchfield, J. Tremblay, J. Carpentier, M. Aubry, D. Fox, and J. Sivic, “Megapose: 6d pose estimation of novel objects via render & compare,” in Conference on Robot Learning (CoRL).   PMLR, 2023, pp. 715–725.
  41. Y. Liu, Y. Wen, S. Peng, C. Lin, X. Long, T. Komura, and W. Wang, “Gen6d: Generalizable model-free 6-dof object pose estimation from rgb images,” in European Conference on Computer Vision (ECCV).   Springer, 2022, pp. 298–315.
  42. Y. Wang, X. Chen, L. Cao, W. Huang, F. Sun, and Y. Wang, “Multimodal token fusion for vision transformers,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 12 186–12 195.
  43. T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun, “Unified perceptual parsing for scene understanding,” in European Conference on Computer Vision (ECCV), 2018, pp. 418–434.
  44. T. Ridnik, E. Ben-Baruch, A. Noy, and L. Zelnik-Manor, “Imagenet-21k pretraining for the masses,” arXiv preprint arXiv:2104.10972, 2021.
  45. Y. Di, F. Manhardt, G. Wang, X. Ji, N. Navab, and F. Tombari, “SO-Pose: Exploiting self-occlusion for direct 6D pose estimation,” in IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 12 396–12 405.
  46. D. Gao, Y. Li, P. Ruhkamp, I. Skobleva, M. Wysocki, H. Jung, P. Wang, A. Guridi, and B. Busam, “Polarimetric pose prediction,” in European Conference on Computer Vision (ECCV).   Springer, 2022, pp. 735–752.
  47. L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the variance of the adaptive learning rate and beyond,” in International Conference on Learning Representations (ICLR), 2019.
  48. M. Zhang, J. Lucas, J. Ba, and G. E. Hinton, “Lookahead optimizer: k steps forward, 1 step back,” in Conference on Neural Information Processing Systems (NeurIPS), H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32.   Curran Associates, Inc., 2019.
  49. H. Yong, J. Huang, X. Hua, and L. Zhang, “Gradient centralization: A new optimization technique for deep neural networks,” in European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 635–652.
  50. J. Park, K. Joo, Z. Hu, C.-K. Liu, and I. S. Kweon, “Non-local spatial propagation network for depth completion,” in European Conference on Computer Vision (ECCV), 2020.
  51. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “PyTorch: An imperative style, high-performance deep learning library,” in Conference on Neural Information Processing Systems (NeurIPS), 2019, pp. 8026–8037.
  52. F. H. Ilya Loshchilov, “SGDR: stochastic gradient descent with warm restarts,” in International Conference on Learning Representations (ICLR), 2017.
  53. Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.
  54. T. Hodan, M. Sundermeyer, B. Drost, Y. Labbe, E. Brachmann, F. Michel, C. Rother, and J. Matas, “BOP Challenge 2020 on 6D object localization,” in European Conference on Computer Vision Workshops (ECCVW), A. Bartoli and A. Fusiello, Eds., 2020, pp. 577–594.
  55. S. Hinterstoisser, V. Lepetit, S. Ilic, K. Konolige, K. Konolige, K. Konolige, and N. Navab, “Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes,” in Asian Conference on Computer Vision (ACCV), 2012, pp. 548–562.
  56. T. Hodaň, J. Matas, and Š. Obdržálek, “On evaluation of 6d object pose estimation,” European Conference on Computer Vision Workshops (ECCVW), pp. 606–619, 2016.
  57. C. Wang, D. Xu, Y. Zhu, R. Martín-Martín, C. Lu, L. Fei-Fei, and S. Savarese, “DenseFusion: 6D object pose estimation by iterative dense fusion,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3343–3352.
  58. Y. He, W. Sun, H. Huang, J. Liu, H. Fan, and J. Sun, “PVN3D: A deep point-wise 3d keypoints voting network for 6dof pose estimation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 632–11 641.
  59. Y. He, H. Huang, H. Fan, Q. Chen, and J. Sun, “FFB6D: A full flow bidirectional fusion network for 6d pose estimation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 3003–3013.
  60. X. Jiang, D. Li, H. Chen, Y. Zheng, R. Zhao, and L. Wu, “Uni6d: A unified cnn framework without projection breakdown for 6d pose estimation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11 174–11 184.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: