Papers
Topics
Authors
Recent
Search
2000 character limit reached

Object Score in Navigation and Tracking

Updated 26 May 2026
  • Object Score (OS) is a metric that integrates neural network detection confidence with geometric or temporal cues for robust object detection and tracking.
  • The methodology uses a convex combination, min–max normalization, and a multiplicative-complement update rule to refine detection scores and maintain persistent tracklets.
  • OS ensures higher computational efficiency and real-time performance by replacing legacy count-based approaches, demonstrating improved AMOTA/MOTA in autonomous systems.

The Object Score (OS) is a quantitative metric for object detection confidence, object persistence, and robust multi-object tracking in both mobile robot navigation and autonomous driving contexts. As formalized in recent literature, OS refers to two connected methodologies sharing a common goal: fusing neural network detection confidence with geometric or temporal cues to optimize detection validity during dynamic environmental perception. Object Score thus spans both per-instance “objectness” assignment in mobile robot navigation (Choi et al., 2019) and multi-object tracklet confidence refinement in 3D object tracking systems (Benbarka et al., 2021).

1. Mathematical Formulation

Two major variants of OS appear in the literature. In the context of Mask R-CNN–based navigation, the Objectness Score SobjS_{\rm obj} is computed as a convex combination of the detector's winning-class probability SpocS_{\rm poc} and a distance-based normalization SdepthS_{\rm depth}:

Sobj=aSpoc+(1a)SdepthS_{\rm obj} = a\,S_{\rm poc} + (1-a)\,S_{\rm depth}

where

  • Spoc[0,1]S_{\rm poc} \in [0,1] is the classifier probability,
  • Sdepth=dimin(d)max(d)min(d)S_{\rm depth} = \frac{d_i - \min(d)}{\max(d) - \min(d)} uses the mean point-cloud depth did_i,
  • a[0,1]a \in [0,1] is a tunable weight.

In multi-object 3D tracking, the tracklet-specific object score ctc_t evolves per frame. The initial value is just the detection confidence sjts_j^t. For each update:

  • Decay: SpocS_{\rm poc}0 (clipped to SpocS_{\rm poc}1),
  • Update if matched: four possible SpocS_{\rm poc}2 functions; multiplicative-complement fusion SpocS_{\rm poc}3 proved optimal (Benbarka et al., 2021).

2. Algorithmic Implementation

For navigation contexts, the OS is computed frame-wise:

  • Mask R-CNN processes each RGB frame, yielding SpocS_{\rm poc}4 (bounding box, class label, SpocS_{\rm poc}5, SpocS_{\rm poc}6).
  • SpocS_{\rm poc}7 is computed via min–max normalization; SpocS_{\rm poc}8 is then calculated.
  • Each detection queries SpocS_{\rm poc}9-nearest persistent objects by 3D Euclidean distance; those locations are reprojected to image space via affine transformation, calculating the 2D IoU.
  • The persistent list is updated: if SdepthS_{\rm depth}0, the higher-SdepthS_{\rm depth}1 detection is retained.
  • Otherwise, new objects are added directly (Choi et al., 2019).

In tracking, the OS maintains a tracklet score:

  • Each frame, tracklets are matched to detections via geometric or learned similarity, and updated using the SdepthS_{\rm depth}2 rule.
  • Unmatched tracklets suffer decay; new detections spawn new tracklets.
  • Activation and deletion thresholds govern reporting and memory (Benbarka et al., 2021).

3. Decision Mechanisms and Threshold Policies

For instance persistence, detections are declared “same object” if 2D IoU SdepthS_{\rm depth}3. Among candidates, the detection with maximal SdepthS_{\rm depth}4 prevails. Non-overlapping cases create new objects. This high IoU threshold enhances spatial consistency and avoids false re-labelling under viewpoint changes (Choi et al., 2019).

In tracking, object scores regulate:

  • Reporting (active if SdepthS_{\rm depth}5, with SdepthS_{\rm depth}6 canonical),
  • Inactivation (hidden but not deleted if SdepthS_{\rm depth}7, with SdepthS_{\rm depth}8 optimal),
  • Deletion (purged if SdepthS_{\rm depth}9, where Sobj=aSpoc+(1a)SdepthS_{\rm obj} = a\,S_{\rm poc} + (1-a)\,S_{\rm depth}0 suffices) (Benbarka et al., 2021).

These thresholds eliminate classical hard-count rules (“min-hits,” “max-age”) in favor of real-valued score lifecycles, supporting more flexible and robust ID management.

4. Geometric and Projection Models

Affine projection, essential in the navigation formulation, maps 3D world coordinates to 2D image coordinates via the standard pinhole camera model:

Sobj=aSpoc+(1a)SdepthS_{\rm obj} = a\,S_{\rm poc} + (1-a)\,S_{\rm depth}1

where

  • Sobj=aSpoc+(1a)SdepthS_{\rm obj} = a\,S_{\rm poc} + (1-a)\,S_{\rm depth}2 is the Sobj=aSpoc+(1a)SdepthS_{\rm obj} = a\,S_{\rm poc} + (1-a)\,S_{\rm depth}3 intrinsic calibration,
  • Sobj=aSpoc+(1a)SdepthS_{\rm obj} = a\,S_{\rm poc} + (1-a)\,S_{\rm depth}4 is the Sobj=aSpoc+(1a)SdepthS_{\rm obj} = a\,S_{\rm poc} + (1-a)\,S_{\rm depth}5 extrinsic (robot pose from odometry),
  • Sobj=aSpoc+(1a)SdepthS_{\rm obj} = a\,S_{\rm poc} + (1-a)\,S_{\rm depth}6,
  • Sobj=aSpoc+(1a)SdepthS_{\rm obj} = a\,S_{\rm poc} + (1-a)\,S_{\rm depth}7 in homogeneous coordinates.

This construction enables valid, viewpoint-invariant association between detections across frames based on spatial geometry instead of appearance alone (Choi et al., 2019).

5. Experimental Protocols and Robustness

In robot navigation trials, OS was evaluated in Gazebo/ROS using a TurtleBot platform with RGB-D input. Testing spanned heading angles (Sobj=aSpoc+(1a)SdepthS_{\rm obj} = a\,S_{\rm poc} + (1-a)\,S_{\rm depth}8, Sobj=aSpoc+(1a)SdepthS_{\rm obj} = a\,S_{\rm poc} + (1-a)\,S_{\rm depth}9, Spoc[0,1]S_{\rm poc} \in [0,1]0) and distances (Spoc[0,1]S_{\rm poc} \in [0,1]1–Spoc[0,1]S_{\rm poc} \in [0,1]2 m) against a static chair. Key controlled variables included Mask R-CNN filtering, min–max normalization bounds, and precise camera calibrations. Baseline Mask R-CNN outputs frequently misclassified the chair at close range. After OS filtering, all angles and distances yielded robust relabelling as “chair” without false positives, demonstrating persistent spatial and semantic stability across challenging conditions (Choi et al., 2019).

For tracking, the OS implementation on nuScenes using CenterPoint detections and PointTracker produced AMOTA/MOTA gains up to Spoc[0,1]S_{\rm poc} \in [0,1]3/Spoc[0,1]S_{\rm poc} \in [0,1]4 over legacy count-based logic. Late-fusion experiments (LiDAR-camera) confirmed similar gains over voting-based schemes. Empirically, the “up-mul” update and modest decay (Spoc[0,1]S_{\rm poc} \in [0,1]5) were critical; larger decay or additive updates degraded performance. Optimal threshold settings further improved both recall and precision (Benbarka et al., 2021).

6. Computational Efficiency and Comparison to Alternatives

Object Score algorithms offer substantial runtime benefits:

  • In navigation-based OS, only lightweight 2D IoU and a linear 3D-to-2D projection (one per stored object) are needed—avoiding considerable overhead (Spoc[0,1]S_{\rm poc} \in [0,1]6 ms) from 3D bounding-box and 3D IoU computation.
  • In tracking, each score update is one arithmetic operation per object, compared to count-based logic that needs more state and is less sensitive to variable detection reliability.

Replacing hard instance counters with OS (continuous score), together with assessment via geometric association, reduces redundant computations and the invocation rate of downstream navigation or SLAM modules. Both the navigation and tracking OS schemes are thus compatible with real-time, large-scale, and memory-efficient deployment (Choi et al., 2019, Benbarka et al., 2021).

7. Comparative Advantages and Observed Trade-Offs

Distinctive advantages of OS-based approaches, as reported in the foundational papers, include:

  • Greater robustness to viewpoint and range variation (navigation),
  • Enhanced AMOTA and MOTA in multi-object tracking (+1.6 and +2.3, respectively, on nuScenes validation),
  • Smoother, more reliable instance lifetimes without the need for hard counter-based logic,
  • Competitive or superior late-fusion ensembling versus voting-based aggregation (with up to +1.67 AMOTA gains),
  • Trivial computational demand and straightforward hyperparameter tuning (primarily decay rate and activation thresholds).

Additive score updates were found to produce spurious overconfidence (inflating false positive/identity switches). The multiplicative-complement “up-mul” rule yielded the best balance between incorporating new detection evidence and maintaining trajectory stability (Benbarka et al., 2021).

A plausible implication is that Object Score, in both navigation and tracking settings, constitutes a unifying approach to dynamic object reliability assessment, bridging detector confidence with spatiotemporal and geometric persistence, and consistently outperforms legacy techniques in resource-constrained, real-time robotic platforms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Object Score (OS).