Hand-drawn Reference Path (HRP)
- Hand-drawn Reference Paths (HRP) are user-specified curves that capture nuanced spatial and semantic preferences for mobile robot navigation.
- Recent developments formalize HRPs as sequences of waypoints, polylines, or Bézier curves and integrate them into planning, perception, and control systems.
- Experimental validations show HRP-based navigation enhances task success and user experience through mixed reality interfaces and soft topological constraints.
A Hand-drawn Reference Path (HRP) is a user-specified curve, usually drawn freehand on a digital or physical interface, that prescribes a preferred or intended navigation route for a mobile robot. Unlike analytically computed shortest paths, the HRP captures nuanced spatial preferences and semantic constraints that are difficult to encode in purely geometric planners. Recent work has formalized HRPs as sequences of waypoints, polylines, or control points, and has integrated them into planning, perception, and control stacks using a combination of optimization, neural (“vision-language”) inference, soft topological constraints, and mixed reality interaction paradigms. This article synthesizes recent developments in HRP-centric navigation, drawing on methodological advances and benchmarking results spanning vision-language navigation, mixed reality interfaces, topological planning with homology constraints, and deep alignment of abstract sketch inputs with on-site sensory observations.
1. Formal Representation and Acquisition of HRPs
Hand-drawn Reference Paths are most commonly represented as ordered sets of planar waypoints sampled at regular spatial intervals or dictated by interactive gestures. In MRReP, each HRP is defined as a set of 2D points in the map frame , , with (Taki et al., 31 Mar 2026). Orientation at each point can be estimated via finite differences, with new waypoints captured if their inter-point distance exceeds a threshold . Smoothing can be applied post-hoc using moving averages, median filters, or via interpolation—piecewise-linear or spline-based—to produce trajectories suitable for precise robot execution.
Alternate models, such as those used in HAM-Nav, treat the path as a freehand curve annotated over a sketched map , where encodes structure, collects labeled landmarks, and tracks the hand-drawn path segment from origin 0 to destination 1 (Tan et al., 31 Jan 2025). In SkeNa, the HRP is rendered as a polyline or a sequence of Bézier curves, constructed either manually or by algorithmic transformation of shortest-path ground truth into a human-like sketch (Xu et al., 5 Aug 2025).
2. From HRP to Robot-Admissible Path: Processing and Integration
A canonical pipeline for HRP-based planning involves filtering, interpolation, and projection of hand-drawn sequences into navigation stack-compatible waypoints or polylines. In MRReP, this proceeds via noise filtering (distance threshold and optional smoothing), interpolation for continuity (linear or B-spline), and collision checking with respect to the environment cost map. For each waypoint 2, collision-free passage is enforced (3). The planner aims to minimize deviation from the user's input while maintaining safety margins (Taki et al., 31 Mar 2026). The practical implementation for Navigation2 systems involves a custom global planner plugin that returns the HRP as the global path, with fallback to conventional A* if no HRP is present.
In vision-language architectures like HAM-Nav, the HRP is rasterized into an occupancy grid, and k-means is applied to free-space cells to generate candidate robot pose nodes. The HRP thus becomes a set of edges overlaying a topological graph, rather than an explicit path with world-metric correspondence (Tan et al., 31 Jan 2025). SkeNa sidesteps explicit metric matching by defining the HRP as part of the sketch map 4, aligned via grid-cropped regions and rendered in abstraction styles, either as straight-line traces or as composite Bézier strokes (Xu et al., 5 Aug 2025).
3. HRP-Based Planning: Topological and Optimization Approaches
Beyond direct path playback, HRPs have motivated planners that preserve topological properties, admit soft constraint trade-offs, or facilitate robust execution under environmental uncertainty. In configuration spaces with obstacles, “Efficient Path Planning with Soft Homology Constraints” introduces the H★ algorithm: let 5 denote a reference path and 6 a candidate. The cost to minimize is
7
where 8 is path length and 9 measures the difference in harmonic homology signature (projected with the kernel of the 1-Laplacian on the domain triangulation) (Taveras et al., 2024). By tuning 0, one trades off strict homology (topological similarity) with geometric optimality. The H★ algorithm performs best-first search in the original graph, incrementally updating both length and homology signature, producing a family of paths ranging from shortest overall to shortest within the reference class. Rollout refinements further improve path optimality while retaining topological fidelity.
In hybrid planning stacks such as those in HAM-Nav and MRReP, the HRP is fused with standard planning modules, allowing dynamic switching between user-specified and optimized sub-paths. Collision checking, velocity profile adjustment, and obstacle avoidance are addressed by the local planner (e.g., Timed Elastic Band, Regulated Pure Pursuit) (Taki et al., 31 Mar 2026). Navigation planners may also enforce weighted cost objectives such as
1
to formally balance fidelity to user input and environmental safety.
4. Perceptual and Semantic Alignment in HRP-Based Navigation
A critical challenge is aligning the HRP—often inaccurate and abstract—to the encountered environment during execution. In HAM-Nav, Selective Visual Association Prompting (SVAP) builds composite image prompts that selectively overlay only high-probability nodes from the pruned topological graph on the map, juxtaposed with the current camera view. Node inclusion is governed by the retention probability
2
where 3 is pixel-space distance and 4 denotes learned transition likelihoods. Only candidates with 5 are used; this reduces noise in the VLM's localization step (Tan et al., 31 Jan 2025).
The Predictive Navigation Plan Parser (PNPP) addresses missing or ambiguous landmark annotations by using a VLM to infer likely co-occurring objects in context. The plan parser segments the HRP topology at junctions, predicts missing objects, and synthesizes human-readable multi-step directives. If a path segment is missing, PNPP prompts the VLM to generate plausible continuations, scoring candidates by their LLM likelihood.
In SkeNa, a Ray-based Map Descriptor (RMD) encodes geometric features of the sketch at a grid of sample points by casting multiple rays from each, recording free-space reachability. This descriptor, combined across the HRP sketch and the agent’s on-site exploration map, enables cross-modal alignment via a Dual-Map Aligned Goal Predictor (DAGP), based on cross-attention between RMDs. This alignment allows robust localization and target determination despite large abstraction gaps and drawing inconsistencies (Xu et al., 5 Aug 2025).
5. Experimental Validation and Comparative Performance
Empirical studies validate the effectiveness of HRP-guided planning and navigation across multiple axes: geometric fidelity, task success, subjective user experience, and cross-domain generalization.
- MRReP: In within-subject experiments, the MR-based HRP interface outperformed a 2D GUI. Stage B results reported median precision/recall improvements from 52.9%/59.0% (2D) to 83.6%/83.7% (MR), SUS usability scores increased from 51.3 to 75.0, and NASA-TLX workload decreased from 61.5 to 47.7 (Taki et al., 31 Mar 2026).
- HAM-Nav: Ablations on photorealistic Gazebo benchmarks showed success rates (SR) and SPLs of 80%/0.71 for the full system. Removal of landmark prediction or SVAP pruning led to pronounced drops in SR (as low as 5%) and SPL (0.01), demonstrating the importance of HRP-centric perceptual modules (Tan et al., 31 Jan 2025). In real-world trials, the Jackal robot achieved SR=78% and SPL=0.71, with corresponding usability assessments in the “Good–Excellent” range.
- SkeNa: On unseen indoor environments with high-abstraction sketches, the SkeNavigator agent achieved SR=8.0%, SPL=7.6% versus 3.9%/3.8% for state-of-the-art floor-plan-based methods (≈105% relative improvement). Low-abstraction scenarios yielded SR=12.7%, SPL=11.9%. Ablation studies confirmed that accurate HRP alignment, via RMD and DAGP, directly drives these gains (Xu et al., 5 Aug 2025).
- Soft Homology Planning: On synthetic multi-obstacle domains, H★ achieved optimal or near-optimal path-topology trade-offs with orders-of-magnitude faster search than homology-augmented explicit methods. For a 316-node, 5-hole map, H★ required ≤316 node visits per new class (versus ≈14,269 for the exact BLK), with rollout refinement further lowering the final path length by 5–10% (Taveras et al., 2024).
Experimental metrics used across HRP studies include coverage/accuracy (TP, FP, FN, TN), SPL, SR, navigation time, HRP-within-GT ratio, and subjective usability and workload scores (SUS, NASA-TLX).
6. System Architectures and Interaction Modalities
HRP-informed navigation architectures typically follow a modular structure:
- Mixed Reality Interfaces (Taki et al., 31 Mar 2026): Users specify HRPs via headset-enabled hand gestures, streaming the path to the robot’s navigation stack using middleware (e.g., ROS-TCP). The global planner in ROS2 receives HRP data and delegates tracking to local controllers such as Regulated Pure Pursuit.
- Sketch Map Parsing and Topological Overlay (Tan et al., 31 Jan 2025): Hand-drawn maps are digitized, converted into occupancy grids, and processed via unsupervised clustering and OCR to yield pose and landmark nodes. VLMs interpret composite visual prompts for localization and planning.
- Automated Sketch-Pipeline and Learning-based Alignment (Xu et al., 5 Aug 2025): Synthetic or human sketches are generated from layout and reference paths, abstracted via polylines or Bézier methods. Keypoint descriptors at grid locations support cross-modal neural alignment with constructed exploration maps.
- Homology-Constrained Graph Search (Taveras et al., 2024): User-drawn reference paths are interpreted on the discrete domain as sequences of graph vertices. Harmonic projections and soft constraints are incorporated into classical search to allow seamless tradeoffs between geometric and topological objectives.
7. Research Directions and Implications
Progress in HRP-centric navigation suggests that explicit human path specification, even when rough or abstract, can be systematically formalized, parsed, and executed by robot planners with high task fidelity. MR-based interfaces, semantic/topological planners, and deep alignment techniques all contribute to closing the gap between human spatial intention and autonomous actuation.
Future research aims include: (i) advanced path smoothing via curvature-minimizing splines, (ii) cost-augmented planners combining HRP fidelity and environmental cost measures, (iii) multi-objective human–robot co-navigation frameworks integrating semantic region constraints, and (iv) robust cross-modal alignment to bridge increasing abstraction in human input (Taki et al., 31 Mar 2026, Xu et al., 5 Aug 2025). Theoretical advances in soft homology and harmonic signatures may facilitate further generalization to cluttered, topologically complex workspaces (Taveras et al., 2024).
A plausible implication is that as interactive, learning-based, and topologically adaptive HRP pipelines mature, hand-drawn or user-specified paths are poised to become a central paradigm for human-in-the-loop robot navigation in complex, real-world environments.