Omni-Scan: Autonomous 3D Digital Twin Scanning
- Omni-Scan is an autonomous robotic system and vision-based pipeline designed to generate omni-directional 3D digital twins for defect inspection.
- It integrates dual-arm scanning, multi-view segmentation, and a modified 3D Gaussian Splats model to overcome occlusions and achieve full surface coverage.
- The system enables precise industrial inspection by merging data from 200 images and achieving an average defect detection accuracy of 83.3%.
Omni-Scan is an autonomous robotic system and vision-based software pipeline for generating visually accurate, omni-directional 3D digital twin models of objects using a bi-manual robot, with specific application to automated inspection for part defects. It addresses the limitations of traditional 3D object scanning methods—such as workspace restrictions and occlusions—by combining dual-arm object manipulation with advanced segmentation, depth estimation, and a modified 3D Gaussian Splats (3DGS) training procedure to handle gripper occlusions. The resulting workflow produces 360-degree photo-geometry models that enable detailed inspection and simulation, with demonstrated accuracy in both visual and geometric defect identification (Qiu et al., 1 Aug 2025).
1. Robotic Scanning Pipeline
Omni-Scan’s data acquisition process is structured around a bi-manual robot, exemplified by the ABB YuMi. The procedure begins with the left arm grasping the object—identifying the optimal approach by evaluating candidate grasps from point clouds generated via stereo depth imaging and supporting RGB imagery. Once lifted, the object is rotated within the gripper across 20 azimuthal steps and 5 distinct elevation angles, yielding approximately 100 unique views. These captures are systematically spaced to optimize coverage and minimize occlusions due to the gripper.
Because occluded surfaces are inevitable in a single-arm scan, the robot performs an in-air handover, transferring the object to the right arm. The right arm repeats the rotational scanning trajectory, providing a complementary set of views. This two-stage, bi-manual pipeline ensures that each surface of the object, previously blocked by a gripper, is exposed in at least one scan set. Transformations between left- and right-arm reference frames are initially estimated via the kinematic chain and handover configuration, with later refinement by point cloud registration using the Iterative Closest Point (ICP) algorithm.
Stage | Method/Tool | Output |
---|---|---|
Left-Arm Scan | In-hand rotation, stereo imaging | 100 multi-view images (with occlusion) |
Handover | Dual-arm transfer, grip localization | Pose transform initialization |
Right-Arm Scan | Replicated scan trajectory | Additional 100 images (complementary) |
Alignment | Rigid registration and ICP | Calibration for merged dataset |
2. Multi-View Segmentation and Pose Estimation
Precise identification and isolation of the object from the gripper and background is achieved using a combination of state-of-the-art computer vision models and geometric reasoning. Each captured RGB image is processed as follows:
- Per-Pixel Depth Estimation: DepthAnything V2 provides detailed depth maps for each view.
- Object and Gripper Masking: The Segment Anything Model (SAM and SAM2) generates initial and propagated segmentation masks. These are refined using Non-Robot and Non-Gripper scoring to robustly exclude non-object pixels.
- Foreground Disambiguation: RAFT optical flow models compute inter-frame motion fields to help localize dynamic (object) regions and enhance mask accuracy, particularly in the presence of partial occlusion or background clutter.
After segmentation, all camera poses are reconciled in a unified reference frame using kinematic transforms from the robot and known camera calibration, as well as the relative gripper poses at the handover event.
3. 3D Gaussian Splats Modeling with Occlusion Compensation
The core 3D model is generated using a modified 3D Gaussian Splats approach, which represents object geometry and appearance as a collection of spatially distributed anisotropic Gaussian ellipsoids with associated color and opacity parameters. The modifications introduced for Omni-Scan address the unique challenges posed by occlusions from the robot’s grippers:
- Concatenated Dataset Training: View sets from both left and right scans are merged to provide maximal surface coverage.
- Gripper-Agnostic Loss Functions:
- Object Opacity Loss: An loss between rendered accumulation (opacity sum from Gaussians) and the binary mask encourages fidelity to segmented object boundaries.
- Occlusion Masking: Per-pixel losses are nullified in any region where the gripper mask overlaps, ensuring that the model ignores ambiguous or absent surface data. This lets information from the counterpart scan (where a surface is unoccluded) train the model for object completion.
- Rigid Transform Alignment: Right-scan camera poses are transformed into the canonical left-frame using the handover transform , with additional accuracy from ICP applied to colored point clouds extracted from the individual scans.
4. Applications: Digital Twin Generation and Defect Inspection
Omni-Scan constructs visually and geometrically faithful digital twins suitable for industrial tasks such as:
- Part Defect Inspection: The pipeline compares reconstructed models against reference objects in order to detect:
- Visual defects (scratches, surface anomalies) above 2 mm.
- Geometric defects (deformations, faults) above 4.5 mm.
- Other Use Cases: The complete models support simulation environments, robot policy learning, virtual reality, and e-commerce visualization.
In experimental evaluations across 12 household and industrial objects, Omni-Scan achieved an average accuracy of 83.3% in defect detection. Visual defect detection yielded 5/5 correct trials; geometric, 6/7. This performance suggests practical utility for non-contact, high-fidelity inspection tasks where full coverage is required.
5. System Performance and Technical Metrics
The overall pipeline for a single object typically executes in approximately 84 minutes on a single NVIDIA RTX 4090 GPU:
- Scanning (dual-arm data capture): ~9 minutes.
- Segmentation mask generation: ~46 minutes for 200 images.
- 3DGS training (including merged/individual sets): ~28 minutes.
The data alignment workflow—including handover and rigid ICP refinement—enables seamless fusion of both scan sets into the final 3DGS model. The object opacity and gripper-agnostic loss functions ensure that occlusions do not degrade fidelity or completeness.
6. Interactive Demonstrations and Community Resources
Interactive visualizations and videos of Omni-Scan reconstructions are available at https://berkeleyautomation.github.io/omni-scan/, which provide empirical validation of multi-view coverage and demonstrate the effectiveness of occlusion handling and merged scan completion. These resources display both pristine and defective object reconstructions with dynamic renderings for qualitative assessment (Qiu et al., 1 Aug 2025).
7. Significance, Limitations, and Outlook
Omni-Scan advances autonomous, high-coverage 3D object modeling by integrating dual-arm robotic coordination, advanced segmentation, and Gaussian Splat modeling with occlusion compensation losses. Its demonstrated accuracy for defect detection positions it as a competitive pipeline for industrial inspection. A plausible implication is that similar principles—bi-manual manipulation with occlusion-robust 3D representation—could see application in areas such as quality assurance, archival digitization, and real-world simulation transfer. Further research may investigate improvements in segmentation under severe occlusion, real-time throughput, or extension to non-rigid objects.