Papers
Topics
Authors
Recent
Search
2000 character limit reached

Success Weighted by Path Length (SPL) in Navigation

Updated 10 April 2026
  • Success weighted by Path Length (SPL) is a quantitative metric that evaluates navigation performance by combining success rate with spatial efficiency.
  • It computes the ratio of the geodesic optimal path length to the agent's actual travel distance, rewarding direct and efficient routes.
  • SPL is widely applied in robotics and embodied AI to benchmark visual navigation, object search, and language-guided movement in diverse environments.

Success weighted by Path Length (SPL) is a widely adopted quantitative metric for evaluating the combined efficacy and efficiency of embodied agents and mobile robots in navigation and active visual search tasks. SPL balances the agent’s success rate with the path optimality relative to the ground-truth shortest route, providing a single score that reflects both task completion and spatial efficiency. The measure is used extensively in visual navigation, object search, and language-guided movement in both 2D and 3D environments.

1. Formal Definition of SPL

SPL is computed over NN evaluation episodes as follows:

SPL=1Ni=1NSiLimax(Li,Pi)\mathrm{SPL} = \frac{1}{N} \sum_{i=1}^N S_i\,\frac{L_i}{\max(L_i,\,P_i)}

where:

  • NN: total number of episodes or trials (scenes ×\times queries),
  • Si{0,1}S_i \in \{0,1\}: success indicator for episode ii ($1$ if the task was solved within the allowed budget and success criteria; $0$ otherwise),
  • LiL_i: length (in meters) of the optimal or geodesic path from the agent’s starting pose to the goal or target (typically computed via A*, RRT, or ground-truth map),
  • PiP_i: the actual path length (in meters) taken by the agent in that episode until termination (whether due to success or budget/timeout).

The ratio SPL=1Ni=1NSiLimax(Li,Pi)\mathrm{SPL} = \frac{1}{N} \sum_{i=1}^N S_i\,\frac{L_i}{\max(L_i,\,P_i)}0 measures efficiency: it equals 1 if the executed path is shortest, and decreases as the agent deviates or wanders from the optimal route. Failed episodes do not contribute to the sum. This construction ensures that both the frequency of successful task completion and the directness of successful trajectories are encoded in a bounded SPL=1Ni=1NSiLimax(Li,Pi)\mathrm{SPL} = \frac{1}{N} \sum_{i=1}^N S_i\,\frac{L_i}{\max(L_i,\,P_i)}1 measure (Park et al., 2022, Yokoyama et al., 2021, Song et al., 29 Oct 2025).

2. Intuition and Interpretive Context

The SPL metric is designed to resolve a limitation of raw success rate (SR), which fails to penalize excessive exploration, circuitous navigation, or inefficient search. By multiplying path efficiency by success,

  • Only episodes where the agent fulfills the task within constraints are credited,
  • Trajectories closely matching the geodesic receive maximal credit,
  • Long detours, repeated revisits, or dithering substantially reduce the per-episode score,
  • Frequent failure (low SPL=1Ni=1NSiLimax(Li,Pi)\mathrm{SPL} = \frac{1}{N} \sum_{i=1}^N S_i\,\frac{L_i}{\max(L_i,\,P_i)}2) or extensive path inefficiency both jointly diminish the aggregate.

SPL is especially relevant in domains where physical resources (e.g., battery life, time, robot wear) are constrained and path optimality is as critical as eventual success (Song et al., 29 Oct 2025).

3. Procedural Computation

The computation of SPL proceeds as follows:

  1. For each evaluation episode:
    • The agent starts at a prescribed pose.
    • SPL=1Ni=1NSiLimax(Li,Pi)\mathrm{SPL} = \frac{1}{N} \sum_{i=1}^N S_i\,\frac{L_i}{\max(L_i,\,P_i)}3 is computed via an optimal planner (A*, RRT) on the known occupancy or navigation mesh.
    • The agent executes its policy. The cumulative traveled distance SPL=1Ni=1NSiLimax(Li,Pi)\mathrm{SPL} = \frac{1}{N} \sum_{i=1}^N S_i\,\frac{L_i}{\max(L_i,\,P_i)}4 is measured by integrating odometry along the actual executed trajectory.
    • The episode is marked as successful (SPL=1Ni=1NSiLimax(Li,Pi)\mathrm{SPL} = \frac{1}{N} \sum_{i=1}^N S_i\,\frac{L_i}{\max(L_i,\,P_i)}5) if the task-specific success criterion is satisfied (e.g., object found within IoU/IoA thresholds, UAV arrives within a spatial/visibility threshold, agent signals “stop” within the goal region, or similar).
    • Episodes terminated by timeouts, exceeding travel budgets, or violation of constraints yield SPL=1Ni=1NSiLimax(Li,Pi)\mathrm{SPL} = \frac{1}{N} \sum_{i=1}^N S_i\,\frac{L_i}{\max(L_i,\,P_i)}6, contributing zero to the numerator.
  2. For each episode, compute SPL=1Ni=1NSiLimax(Li,Pi)\mathrm{SPL} = \frac{1}{N} \sum_{i=1}^N S_i\,\frac{L_i}{\max(L_i,\,P_i)}7.
  3. SPL is averaged over all episodes to produce the summary statistic.

Notable implementation specifics include continuous-valued path lengths (rather than step counts), per-domain success definitions (e.g., IoUSPL=1Ni=1NSiLimax(Li,Pi)\mathrm{SPL} = \frac{1}{N} \sum_{i=1}^N S_i\,\frac{L_i}{\max(L_i,\,P_i)}8 for object detection (Park et al., 2022), within SPL=1Ni=1NSiLimax(Li,Pi)\mathrm{SPL} = \frac{1}{N} \sum_{i=1}^N S_i\,\frac{L_i}{\max(L_i,\,P_i)}9m and goal-visibility for UAVs (Song et al., 29 Oct 2025)), and strict episodic travel budgets (Park et al., 2022).

4. Use Cases and Variations in Practice

SPL has been utilized as the primary evaluation criterion in:

  • Active visual object search (Park et al., 2022), where SPL rewards both accurate object localization and efficient environment traversal.
  • Vision-language navigation and UAV task-centric benchmarks (Song et al., 29 Oct 2025), where SPL highlights methods that combine semantic competence with geometric discipline.
  • General embodied goal-driven navigation (Yokoyama et al., 2021), frequently serving as the touchstone metric since Anderson et al. (2018).

The metric adapts across settings with adjustments in success criteria (e.g., IoU/IoA for visual detection, spatial proximity for navigation), path length computation methodology, and handling of physical versus simulated odometry. Recent works report per-task SPL and ablate method variants, consistently finding that removal of semantically or geometrically informed modules results in degraded SPL (Park et al., 2022).

SPL Across Benchmarks

Scenario Prior Best SPL Method/Configuration SPL SPL Gain
RoboThor, multi-room 0.1154 ZAVIS (full) 0.3462 +0.2308
AI2-THOR Bedroom (30 scenes) 0.2334 ZAVIS (full) 0.3017 +0.0683
AI2-THOR Livingroom (30 scenes) 0.1410 ZAVIS (full) 0.1969 +0.0559
SoraNav, UAV 2.5D 0.46 SoraNav 0.54 +17%
SoraNav, UAV 3D 0.48 SoraNav 0.57 +18.5%

This table summarizes representative SPL values and relative improvements as reported in (Park et al., 2022) and (Song et al., 29 Oct 2025).

5. Limitations, Extensions, and Pitfalls

SPL, while statistically robust and widely applicable, imposes assumptions that limit its suitability in some robotics scenarios (Yokoyama et al., 2021):

  • The path length NN0 is computed as the shortest collision-free spatial geodesic, independent of the agent’s kinematic or dynamic constraints. For agents with non-trivial dynamics (e.g., unicycles, UAVs with smooth arc motion), the true time-optimal path may be spatially longer but temporally shorter. SPL penalizes these “dynamics-aware” solutions, potentially undervaluing practical or energetically preferable policies.
  • SPL is insensitive to idling or oscillatory behaviors at a single spatial location. If agents stall or exhibit redundant heading changes without translating, path length remains unchanged and excess time is unpenalized.

To address these deficiencies, Success weighted by Completion Time (SCT) is introduced in (Yokoyama et al., 2021), in which the denominator is the empirically attainable minimum time given agent constraints instead of distance, thus internalizing motion model realities. SPL remains, however, the simplest and most interpretable metric for spatial navigation efficiency in agents with idealized or point-turn dynamics.

6. Comparative Role and Interpretive Significance

SPL serves as a key comparator in ablation studies, cross-method benchmarks, and for exposing pathological exploration rooted in semantic or geometric misinterpretations. High SPL correlates with direct, non-redundant exploration (minimal revisiting, avoidance of dead-ends) and is sensitive to both advances in sensory inference and improvements in semantic or geometric planning (Song et al., 29 Oct 2025). It enables quantitative assessment of system upgrades and is widely reported across embodied AI, robotics, and cross-modal navigation literature as a primary measure of navigation and search competence.

7. Summary of Empirical Impact

High or improved SPL values, as achieved by methods leveraging commonsense co-occurrence priors, semantic uncertainty, or geometry-based planning modules, reflect advances not merely in detection but in direct, parsimonious route execution (Park et al., 2022, Song et al., 29 Oct 2025). Conversely, SPL exposes the weaknesses of agents whose successes are marred by circuitous, uncertain or exploratory excess. It is thus indispensable for holistic evaluation of both navigation and search intelligence, especially when practical deployment efficiency is a system requirement.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Success weighted by Path Length (SPL).