SPL: Navigation Efficiency Metric
- SPL is a standardized metric that evaluates navigation performance by combining success rate with path optimality through a normalized score.
- It penalizes detours by comparing the executed trajectory length with the geodesic shortest-path distance, ensuring only efficient routes score highly.
- Empirical studies using SPL have driven advancements in both simulation and real-world navigation benchmarks, influencing algorithm design and performance evaluation.
Success weighted by Path Length (SPL) is a standard scalar navigation efficiency metric used to evaluate embodied and autonomous agents in goal-directed navigation tasks. SPL quantifies both the reliability (success rate) and the efficiency (path optimality) of navigation policies, providing a normalized score that facilitates fair comparison across heterogeneous methods and environments. SPL has become the principal metric for benchmarking classic and learned navigation pipelines in simulated and real-world environments, and is foundational in recent works targeting embodied object-goal navigation and similar scenarios (Chabal et al., 30 Nov 2025, Mishkin et al., 2019).
1. Formal Definition and Computation
SPL is measured over a set of navigation episodes. For episode :
- : success indicator (1 if the agent reaches the goal within the defined success criteria, 0 otherwise)
- : geodesic shortest-path distance from the start to the goal or nearest valid stopping location ("ideal" path length)
- : actual trajectory length traversed by the agent ("executed" path)
The SPL metric is given by
This formulation caps per-episode contribution at 1, only credits successful episodes, and down-weights episodes with unnecessarily long paths (Mishkin et al., 2019, Chabal et al., 30 Nov 2025, Yokoyama et al., 2021).
Stepwise computation for each episode:
- Determine success: if agent terminates within success proximity (e.g., within 1 m of the goal and visibility threshold), else .
- Compute ideal distance: via geodesic shortest path (e.g., A*, FMM), often measured on 2D or 3D navigation meshes.
- Accumulate executed path length: as sum of Euclidean distances for each movement step; rotations typically not counted unless explicitly specified (see below).
- Compute per-episode score: . Zero if unsuccessful.
- Average over all episodes for the final SPL.
A numerical example as reported in (Mishkin et al., 2019):
- Episode 1: , , yields $0.667$
- Episode 2: , , yields $0$
- Episode 3: , , yields $1$ Final .
2. Rationale for SPL and Comparative Advantages
SPL is designed to balance goal-reaching reliability with path optimality. Unlike Success Rate (SR) or mean path length (PL) alone, SPL:
- Penalizes Detours: Reaching the goal via sub-optimal routes reduces SPL by the ratio , discouraging circuitous exploration (Chabal et al., 30 Nov 2025).
- Credits Only Successful Episodes: SPL is zero for failures, thus not inflated by agents that merely wander extensively.
- Normalized and Comparable: SPL is always in and supports direct comparison across agents and environment complexities.
- Realistic Efficiency Measure: It accounts for real-world requirements such as minimizing energy and time by favoring near-geodesic, successful trajectories.
In embodied navigation contexts, SPL is preferred over SR or pace for its stricter coupling of success with navigation efficiency (Chabal et al., 30 Nov 2025, Mishkin et al., 2019).
3. Implementation Protocols in Benchmarking
General Protocol
- Success Threshold: Defined by proximity (typically 1 m) and/or visibility to the goal; agent must execute a STOP action within this region.
- Navigation Budget: Maximum allowed steps/episodes (commonly 500 discrete steps) set for each episode.
- Trajectory Measurement: Only translational motions are accumulated in unless the protocol includes penalization for rotations (as in FOM-Nav).
- Geodesic Computation: Ideal path is precomputed using algorithms like A* or FMM on the environment's obstacle map or ground truth mesh.
FOM-Nav Specifics
- Trajectory Length: Includes all forward 25 cm motions; for HM3D v2, only 2D displacements are counted due to height annotation inconsistencies.
- Rotation: Episodes typically start with a 360° rotation; these steps are added to and penalize SPL if excessive (Chabal et al., 30 Nov 2025).
Table: Success Criteria and Path Calculation (as per FOM-Nav)
| Protocol Aspect | FOM-Nav Specification | Standard Benchmark Specification (Mishkin et al., 2019) |
|---|---|---|
| Success distance | m + visibility | 0.2 m (goal radius) |
| Path measurement | 25 cm translation per step; includes rotations | Only forward translation; rotations ignored |
| Geodesic estimation | FMM on learned/auto map | A* on ground-truth mesh |
| Step/time budget | 500 steps | 500 steps or 50 s |
4. Empirical Results and Component Analysis
SPL serves as the principal evaluation metric in recent navigation benchmarks.
Comparative Performance (from FOM-Nav (Chabal et al., 30 Nov 2025))
| Method | MP3D_sub SPL | HM3D v1_sub SPL | HM3D v2 SPL |
|---|---|---|---|
| RIM | 15.8 | 27.6 | 22.2 |
| PIRLNav | — | 34.7 | 27.0 |
| VLFM* | 19.6 | 37.6 | 33.0 |
| VLFM† | 19.8 | 39.6 | 33.6 |
| FOM-Nav | 23.9 | 52.1 | 47.9 |
Incremental improvements in SPL can be attributed to architectural modifications. In FOM-Nav, moving from basic to full models resulted in a SPL improvement, and ablation studies show contributions from explicit object/scene encoding, classical planning, and mixed ground-truth plus auto-generated map data.
5. Limitations and Domain-Specific Caveats
SPL assumes that geodesic path length is a faithful surrogate for true navigation cost. It has notable limitations:
- Ignores Rotation and Idle Time: Standard SPL ignores non-translational actions, which may distort efficiency assessments, especially for agents with complex or nonholonomic dynamics (Yokoyama et al., 2021, Mishkin et al., 2019).
- Harsh Failure Treatment: Any failure, regardless of proximity to success, contributes zero.
- Dependence on Oracle Path: SPL presumes access to , which may not reflect reachable trajectories in dynamic, imperfectly mapped, or real-world scenarios (Mishkin et al., 2019).
- Insensitive to Dynamics: For curved-dynamics agents (e.g., unicycle), fastest time paths are not the shortest in distance. SPL can underreport the efficiency of such agents compared to point-turn models (Yokoyama et al., 2021).
Alternative measures, such as Success weighted by Completion Time (SCT), address some of these issues by normalizing to minimum-time trajectories defined via agent dynamics (Yokoyama et al., 2021).
6. Influence on Navigation Research and Future Directions
SPL is the canonical metric for navigation tasks in the Habitat, Matterport3D, and HM3D evaluation protocols. Its widespread use supports reproducibility and apples-to-apples comparison across research groups and methodologies (Mishkin et al., 2019, Chabal et al., 30 Nov 2025).
The introduction of SPL has significantly influenced the design of embodied navigation agents, incentivizing methods that reliably reach goals without excessively circuitous behaviors. Contemporary research explores SPL-driven architectural ablations, data policies, multimodal perception pipelines, and hybrid classical-learning solutions for maximizing SPL on standard benchmarks.
Recent works also highlight the need for dynamics-aware or more nuanced efficiency criteria, such as SCT, to address SPL’s insensitivity to non-Euclidean agent motion, rotational inefficiencies, and application-specific energy/time budgets (Yokoyama et al., 2021).
7. References
- FOM-Nav: "FOM-Nav: Frontier-Object Maps for Object Goal Navigation" (Chabal et al., 30 Nov 2025)
- Anderson et al., "Benchmarking Classic and Learned Navigation in Complex 3D Environments" (Mishkin et al., 2019)
- Kahn et al., "Success Weighted by Completion Time: A Dynamics-Aware Evaluation Criteria for Embodied Navigation" (Yokoyama et al., 2021)