Object Integrity Score (OIS)
- Object Integrity Score (OIS) is a metric that integrates deep learning class probabilities with normalized depth measurements to maintain semantic consistency in object labeling.
- The method employs an online matching procedure using 2D IoU and 3D nearest-neighbor search to update object records only when a higher OIS is detected.
- Experimental results demonstrated that OIS achieves 100% labeling consistency at close ranges, significantly outperforming conventional methods on mobile robot platforms.
The Object Integrity Score (OIS), as introduced by Choi et al., is a quantitative metric designed to enhance semantic consistency in object detection for mobile robot navigation using deep learning-based detectors such as Mask R-CNN. The OIS integrates the detector’s probabilistic class output with a geometric term that explicitly leverages depth information, ensuring robust maintenance of object identity and class assignment over spatial and angular variations, especially as the robot moves in dynamic environments (Choi et al., 2019).
1. Formal Definition of the Object Integrity Score
The OIS is defined to combine (i) the class probability assigned by a deep learning network (DLN) to a detected object and (ii) a normalized distance-based term that rewards detections made within suitable observation ranges. Mathematically,
where
- is the class probability output by the DLN (e.g., Mask R-CNN) for a given bounding box,
- quantifies the normalized proximity of the object to the camera, and
- is a user-tunable fusion coefficient (in experiments, ).
The normalized depth score is formally given by
where is the measured distance from camera to object, is the minimum reliable distance (set to 0.8 m), and is the maximum reliable range (set to 3.0 m). Values for are clipped: 0 for 1 and 2 for 3.
2. On-line Computation and Matching Procedure
The OIS is integrated into a real-time pipeline for semantic mapping and persistent object labeling as follows:
- RGB-D Sensing and Preparation: Upon acquiring a frame, Mask R-CNN produces candidate 2D bounding boxes 4 with associated class probabilities.
- Depth Localization: For each 5, all in-box depth pixels are aggregated to compute a representative mean depth 6. The corresponding object centroid 7 is estimated.
- Object Re-identification via Projection: Each tracked object record maintains its historical 3D position, bounding box, and stored OIS. Prior objects are projected into the current view: 8, where 9 is the camera calibration matrix and 0 encodes odometry.
- Matching Procedure: For each new detection 1 (with 2), the algorithm:
- Locates 3 nearest tracked objects in 3D.
- Computes 2D Intersection-over-Union (IoU) between 4 and each reconstructed 5.
- If 6, the two detections are associated.
- The object record for a spatial location is only updated if the new detection’s 7 is greater; otherwise, it is ignored.
- Unmatched detections are added as new entries.
This mechanism ensures that for any tracked location, only the label with the highest OIS is retained, enhancing class stability as the robot’s viewpoint changes.
3. Experimental Evaluation and Comparative Analysis
Experiments were conducted in simulation using ROS + Gazebo, with a TurtleBot carrying an RGB-D sensor navigating a scene containing a single object (e.g., a chair). Robot headings were systematically varied (8) and distances tested from 0.3 m to 3 m.
- Baseline: Mask R-CNN, storing only the maximum probability class per detection.
- Proposed: Mask R-CNN augmented with OIS and on-line matching.
Qualitative results revealed that, with standard Mask R-CNN, correct object labels (“chair”) were only retained at longer distances (9 m); at closer ranges, dramatic class “drift” occurred, with the label switching to unrelated categories or disappearing altogether. By contrast, the OIS-based method preserved the correct label (“chair”) across all tested distances and headings, demonstrating 100% consistency versus less than 50% under the baseline at close range. Figure 1 (in (Choi et al., 2019)) visually shows temporally stable labeling with OIS.
4. Runtime and Computational Efficiency
All components of the OIS pipeline—bounding box IoU, 3D-to-2D reprojection, and 0-nearest-neighbor search in 3D—are implemented to run in real time, with per-frame computation times significantly below 120 ms. This efficiency contrasts with conventional 3D IoU-based methods (Song & Xiao 2016, Qi et al. 2018), which typically require explicit 3D bounding-box fitting and incur at least 120 ms per frame. The 2D-plus-depth paradigm adopted here reduces this overhead by approximately one order of magnitude (Choi et al., 2019).
5. Key Limitations and Qualitative Boundaries
The primary evaluation setting is a synthetic, static scene containing a single object, limiting assessment of performance in multi-object, dynamic, or cluttered environments. The matching procedure leverages strict 2D IoU thresholds and local neighbor search in 3D; this could become less robust with rapid viewpoint shifts, partial occlusion, or dynamic backgrounds. A plausible implication is that, while OIS robustly serves single-object scenarios in clean surroundings, further extensions are necessary for comprehensive deployment in real-world settings.
6. Prospective Extensions and Research Directions
Future work, as articulated by the authors, includes discriminating which maintained objects actually represent dynamic obstacles, extending the framework to handle clutter and dynamic scenes, and integrating motion models to track moving landmarks. Effective association and label maintenance under real-world complexity—such as transient occlusion, overlapping objects, or active scene manipulation—remain open areas for investigation (Choi et al., 2019).
7. Comparative Perspective and Technological Significance
The OIS advances the state of the art in consistency-preserving object detection for embodied robotics by demonstrating that lightweight integration of detection probability and range context can mitigate catastrophic label drift as a robot traverses a scene. Unlike 3D IoU or volumetric matching schemes, OIS yields robust label locking with minimal computational burden. This suggests the potential for OIS-informed modules in future real-time SLAM and semantic mapping systems, subject to validation in more complex, dynamic environments (Choi et al., 2019).