FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything (2403.00175v2)
Abstract: In the realm of computer vision, the integration of advanced techniques into the processing of RGB-D camera inputs poses a significant challenge, given the inherent complexities arising from diverse environmental conditions and varying object appearances. Therefore, this paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery. Traditional computer vision systems face limitations in simultaneously capturing precise object boundaries and achieving high-precision object detection on depth map as they are mainly proposed for RGB cameras. To address this challenge, FusionVision adopts an integrated approach by merging state-of-the-art object detection techniques, with advanced instance segmentation methods. The integration of these components enables a holistic (unified analysis of information obtained from both color \textit{RGB} and depth \textit{D} channels) interpretation of RGB-D data, facilitating the extraction of comprehensive and accurate object information. The proposed FusionVision pipeline employs YOLO for identifying objects within the RGB image domain. Subsequently, FastSAM, an innovative semantic segmentation model, is applied to delineate object boundaries, yielding refined segmentation masks. The synergy between these components and their integration into 3D scene understanding ensures a cohesive fusion of object detection and segmentation, enhancing overall precision in 3D object segmentation. The code and pre-trained models are publicly available at https://github.com/safouaneelg/FusionVision/.
- Ming Liu. Robotic online path planning on point cloud. IEEE Transactions on Cybernetics, 46(5):1217−−--- -1228, 2016.
- Recent advances and perspectives in deep learning techniques for 3d point cloud data processing. Robotics, 12(4):100, 2023.
- Segmentation of 3d point cloud data representing full human body geometry: A review. Pattern Recognition, page 109444, 2023.
- A novel neighbor aggregation function for medical point cloud analysis. In Computer Graphics International Conference, pages 301−−--- -−−--- -312. Springer, 2023.
- Real−--time lidar point−--cloud moving object segmentation for autonomous driving. Sensors, 23(1):547, 2023.
- Pmpf: Point−--cloud multiple−--pixel fusion−--based 3d object detection for autonomous driving. Remote Sensing, 15(6):1580, 2023.
- Extraction of a floor plan from a points cloud: some metrological considerations. Acta IMEKO, 12(2):1−−--- -−−--- -9, 2023.
- Applications of data fusion in optical coordinate metrology: a review. The International Journal of Advanced Manufacturing Technology, 124(5−--6):1341−−--- -−−--- -1356, 2023.
- Cihan Altuntas. Review of scanning and pixel array−--based lidar point−--cloud measurement techniques to capture 3d shape or motion. Applied Sciences, 13(11):6488, 2023.
- Rgb−--d datasets for robotic perception in site−--specific agricultural operations—a survey. Computers and Electronics in Agriculture, 212:108035, 2023.
- Robust depth−--aided rgbd−--inertial odometry for indoor localization. Measurement, 209:112487, 2023.
- Deep learning for video object segmentation: a review. Artificial Intelligence Review, 56(1):457−−--- -−−--- -531, 2023.
- A survey of efficient deep learning models for moving object segmentation. APSIPA Transactions on Signal and Information Processing, 12(1), 2023.
- A survey: object detection methods from cnn to transformer. Multimedia Tools and Applications, 82(14):21353−−--- -−−--- -21383, 2023.
- A comprehensive review of object detection with deep learning. Digital Signal Processing, 132:103812, 2023.
- Shet Reshma Prakash and Paras Nath Singh. Object detection through region proposal based techniques. Materials Today: Proceedings, 46:3997–4002, 2021. International Conference on Materials, Manufacturing and Mechanical Engineering for Sustainable Developments-2020 (ICMSD 2020).
- SSD: single shot multibox detector. CoRR, abs/1512.02325, 2015.
- You only look once: Unified, real−--time object detection. CoRR, abs/1506.02640, 2015.
- A comprehensive review of modern object segmentation approaches. Foundations and Trends® in Computer Graphics and Vision, 13(2–3):111–283, 2022.
- Recent progress in semantic image segmentation. Artificial Intelligence Review, 52(2):1089–1106, June 2018.
- A survey on instance segmentation: state of the art. International Journal of Multimedia Information Retrieval, 9(3):171–189, July 2020.
- Ultralytics yolov8. 2023.
- A review of convolutional neural network architectures and their optimizations. Artificial Intelligence Review, 56(3):1905−−--- -−−--- -1969, 2023.
- Nms-loss: Learning with non-maximum suppression for crowded pedestrian detection. CoRR, abs/2106.02426, 2021.
- Segment anything, 2023.
- Application of u−--net and optimized clustering in medical image segmentation: A review. CMES−--Computer Modeling in Engineering & Sciences, 136(3), 2023.
- Modified u−--net for plant diseased leaf image segmentation. Computers and Electronics in Agriculture, 204:107511, 2023.
- Attention swin u−--net: Cross−--contextual attention mechanism for skin lesion segmentation. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1−−--- -−−--- -5. IEEE, 2023.
- U−--net: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597, 2015.
- Computer−--vision benchmark segment−--anything model (sam) in medical images: Accuracy in 12 datasets, 2023.
- Segment anything is a good pseudo−--label generator for weakly supervised semantic segmentation, 2023.
- The segment anything model (sam) for remote sensing applications: From zero to one shot. International Journal of Applied Earth Observation and Geoinformation, 124:103540, 2023.
- The research of a novel wog−--yolo algorithm for autonomous driving object detection. Scientific reports, 13(1):3699, 2023.
- A comprehensive systematic review of yolo for medical object detection (2018 to 2023). Authorea Preprints, 2023.
- Accuracy of rgb−--d camera−--based and stereophotogrammetric facial scanners: a comparative study. Journal of Dentistry, 127:104302, 2022.
- Depthtrack : Unveiling the power of RGBD tracking. CoRR, abs/2108.13962, 2021.
- On 3d reconstruction using rgb−--d cameras. Digital, 2(3):401−−--- -−−--- -421, 2022.
- High−--quality indoor scene 3d reconstruction with rgb−--d cameras: A brief review. Computational Visual Media, 8(3):369−−--- -−−--- -393, 2022.
- Dynamic hand gesture recognition using rgb−--d data for natural human−--computer interaction. Journal of Intelligent & Fuzzy Systems, 32(5):3495−−--- -−−--- -3507, 2017.
- Rgb−--d camera assists virtual studio through human computer interaction. In Institute of Manage ment Science and Industrial Engineering. Proceedings of 2018 3rd International Conference on Materials Science, Machinery and Energy Engineering (MSMEE 2018), volume 6. Institute of Management Science and Industrial Engineer−-- ing: Computer …, 2018.
- Rgb−--d object detection and semantic segmentation for autonomous manipulation in clutter. The International Journal of Robotics Research, 37(4−--5):437−−--- -−−--- -451, 2018.
- Rgb−--d camera based wearable navigation system for the visually impaired. Computer vision and Image understanding, 149:3−−--- -−−--- -20, 2016.
- 3−--d mapping with an rgb−--d camera. IEEE transactions on robotics, 30(1):177−−--- -−−--- -187, 2013.
- Rgb−--d object recognition: Features, algorithms, and a large scale benchmark. Consumer Depth Cameras for Computer Vision: Research Topics and Applications, pages 167−−--- -−−--- -192, 2013.
- Rgb−--d object modelling for object recognition and tracking. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 96−−--- -−−--- -103. IEEE, 2015.
- Assessing the performance of rgb−--d sensors for 3d fruit crop canopy characterization under different operating and lighting conditions. Sensors, 20(24), 2020.
- Real-time 3d object detection from point clouds using an rgb-d camera. In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,, pages 407–414. INSTICC, SciTePress, 2020.
- Yolov3: An incremental improvement. CoRR, abs/1804.02767, 2018.
- Frustum pointnets for 3d object detection from RGB-D data. CoRR, abs/1711.08488, 2017.
- Microsoft coco: Common objects in context, 2015.
- Roboflow (version 1.0). [Software], 2022. https://roboflow.com.
- Tzutalin. Labelimg. Free Software: MIT License, 2015.
- VGG image annotator (VIA). http://www.robots.ox.ac.uk/ vgg/software/via/, 2016. Version: 2.0.1, Accessed: 08.10.2018.
- A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings, 3(1):91–99, 2022. International Conference on Intelligent Engineering Approach(ICIEA-2022).
- Optimised calibration of machine vision system for close range photogrammetry based on machine learning. Journal of King Saud University - Computer and Information Sciences, 34(9):7406–7418, 2022.
- A versatile calibration procedure for portable coded aperture gamma cameras and rgb-d sensors. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 886:125–133, 2018.
- Carlos Moreno. A comparative study of filtering methods for point clouds in real-time video streaming.
- Fast statistical outlier removal based method for large 3d point clouds of outdoor environments. IFAC-PapersOnLine, 51(22):348–353, 2018. 12th IFAC Symposium on Robot Control SYROCO 2018.
- Adam: A method for stochastic optimization, 2017.
- Intel Corporation. Intel RealSense SDK 2.0 - Python Documentation. https://dev.intelrealsense.com/docs/python2, 2022. Developer Documentation.