Object Score in Navigation and Tracking
- Object Score (OS) is a metric that integrates neural network detection confidence with geometric or temporal cues for robust object detection and tracking.
- The methodology uses a convex combination, min–max normalization, and a multiplicative-complement update rule to refine detection scores and maintain persistent tracklets.
- OS ensures higher computational efficiency and real-time performance by replacing legacy count-based approaches, demonstrating improved AMOTA/MOTA in autonomous systems.
The Object Score (OS) is a quantitative metric for object detection confidence, object persistence, and robust multi-object tracking in both mobile robot navigation and autonomous driving contexts. As formalized in recent literature, OS refers to two connected methodologies sharing a common goal: fusing neural network detection confidence with geometric or temporal cues to optimize detection validity during dynamic environmental perception. Object Score thus spans both per-instance “objectness” assignment in mobile robot navigation (Choi et al., 2019) and multi-object tracklet confidence refinement in 3D object tracking systems (Benbarka et al., 2021).
1. Mathematical Formulation
Two major variants of OS appear in the literature. In the context of Mask R-CNN–based navigation, the Objectness Score is computed as a convex combination of the detector's winning-class probability and a distance-based normalization :
where
- is the classifier probability,
- uses the mean point-cloud depth ,
- is a tunable weight.
In multi-object 3D tracking, the tracklet-specific object score evolves per frame. The initial value is just the detection confidence . For each update:
- Decay: 0 (clipped to 1),
- Update if matched: four possible 2 functions; multiplicative-complement fusion 3 proved optimal (Benbarka et al., 2021).
2. Algorithmic Implementation
For navigation contexts, the OS is computed frame-wise:
- Mask R-CNN processes each RGB frame, yielding 4 (bounding box, class label, 5, 6).
- 7 is computed via min–max normalization; 8 is then calculated.
- Each detection queries 9-nearest persistent objects by 3D Euclidean distance; those locations are reprojected to image space via affine transformation, calculating the 2D IoU.
- The persistent list is updated: if 0, the higher-1 detection is retained.
- Otherwise, new objects are added directly (Choi et al., 2019).
In tracking, the OS maintains a tracklet score:
- Each frame, tracklets are matched to detections via geometric or learned similarity, and updated using the 2 rule.
- Unmatched tracklets suffer decay; new detections spawn new tracklets.
- Activation and deletion thresholds govern reporting and memory (Benbarka et al., 2021).
3. Decision Mechanisms and Threshold Policies
For instance persistence, detections are declared “same object” if 2D IoU 3. Among candidates, the detection with maximal 4 prevails. Non-overlapping cases create new objects. This high IoU threshold enhances spatial consistency and avoids false re-labelling under viewpoint changes (Choi et al., 2019).
In tracking, object scores regulate:
- Reporting (active if 5, with 6 canonical),
- Inactivation (hidden but not deleted if 7, with 8 optimal),
- Deletion (purged if 9, where 0 suffices) (Benbarka et al., 2021).
These thresholds eliminate classical hard-count rules (“min-hits,” “max-age”) in favor of real-valued score lifecycles, supporting more flexible and robust ID management.
4. Geometric and Projection Models
Affine projection, essential in the navigation formulation, maps 3D world coordinates to 2D image coordinates via the standard pinhole camera model:
1
where
- 2 is the 3 intrinsic calibration,
- 4 is the 5 extrinsic (robot pose from odometry),
- 6,
- 7 in homogeneous coordinates.
This construction enables valid, viewpoint-invariant association between detections across frames based on spatial geometry instead of appearance alone (Choi et al., 2019).
5. Experimental Protocols and Robustness
In robot navigation trials, OS was evaluated in Gazebo/ROS using a TurtleBot platform with RGB-D input. Testing spanned heading angles (8, 9, 0) and distances (1–2 m) against a static chair. Key controlled variables included Mask R-CNN filtering, min–max normalization bounds, and precise camera calibrations. Baseline Mask R-CNN outputs frequently misclassified the chair at close range. After OS filtering, all angles and distances yielded robust relabelling as “chair” without false positives, demonstrating persistent spatial and semantic stability across challenging conditions (Choi et al., 2019).
For tracking, the OS implementation on nuScenes using CenterPoint detections and PointTracker produced AMOTA/MOTA gains up to 3/4 over legacy count-based logic. Late-fusion experiments (LiDAR-camera) confirmed similar gains over voting-based schemes. Empirically, the “up-mul” update and modest decay (5) were critical; larger decay or additive updates degraded performance. Optimal threshold settings further improved both recall and precision (Benbarka et al., 2021).
6. Computational Efficiency and Comparison to Alternatives
Object Score algorithms offer substantial runtime benefits:
- In navigation-based OS, only lightweight 2D IoU and a linear 3D-to-2D projection (one per stored object) are needed—avoiding considerable overhead (6 ms) from 3D bounding-box and 3D IoU computation.
- In tracking, each score update is one arithmetic operation per object, compared to count-based logic that needs more state and is less sensitive to variable detection reliability.
Replacing hard instance counters with OS (continuous score), together with assessment via geometric association, reduces redundant computations and the invocation rate of downstream navigation or SLAM modules. Both the navigation and tracking OS schemes are thus compatible with real-time, large-scale, and memory-efficient deployment (Choi et al., 2019, Benbarka et al., 2021).
7. Comparative Advantages and Observed Trade-Offs
Distinctive advantages of OS-based approaches, as reported in the foundational papers, include:
- Greater robustness to viewpoint and range variation (navigation),
- Enhanced AMOTA and MOTA in multi-object tracking (+1.6 and +2.3, respectively, on nuScenes validation),
- Smoother, more reliable instance lifetimes without the need for hard counter-based logic,
- Competitive or superior late-fusion ensembling versus voting-based aggregation (with up to +1.67 AMOTA gains),
- Trivial computational demand and straightforward hyperparameter tuning (primarily decay rate and activation thresholds).
Additive score updates were found to produce spurious overconfidence (inflating false positive/identity switches). The multiplicative-complement “up-mul” rule yielded the best balance between incorporating new detection evidence and maintaining trajectory stability (Benbarka et al., 2021).
A plausible implication is that Object Score, in both navigation and tracking settings, constitutes a unifying approach to dynamic object reliability assessment, bridging detector confidence with spatiotemporal and geometric persistence, and consistently outperforms legacy techniques in resource-constrained, real-time robotic platforms.