Papers
Topics
Authors
Recent
Search
2000 character limit reached

Quadrupped-Legged Robot Movement Plan Generation using Large Language Model

Published 24 Dec 2025 in cs.RO and cs.HC | (2512.21293v1)

Abstract: Traditional control interfaces for quadruped robots often impose a high barrier to entry, requiring specialized technical knowledge for effective operation. To address this, this paper presents a novel control framework that integrates LLMs to enable intuitive, natural language-based navigation. We propose a distributed architecture where high-level instruction processing is offloaded to an external server to overcome the onboard computational constraints of the DeepRobotics Jueying Lite 3 platform. The system grounds LLM-generated plans into executable ROS navigation commands using real-time sensor fusion (LiDAR, IMU, and Odometry). Experimental validation was conducted in a structured indoor environment across four distinct scenarios, ranging from single-room tasks to complex cross-zone navigation. The results demonstrate the system's robustness, achieving an aggregate success rate of over 90\% across all scenarios, validating the feasibility of offloaded LLM-based planning for autonomous quadruped deployment in real-world settings.

Summary

  • The paper demonstrates a novel system that converts natural language commands into executable JSON movement plans for quadruped robots.
  • The distributed architecture offloads high-level instruction processing to remote LLMs while onboard modules handle sensor fusion and ROS-based actuation.
  • Empirical results show over 90% success rates in diverse indoor scenarios with potential for enhancements via retrieval-augmented generation and vision-language models.

LLM-Driven Natural Language Movement Planning for Quadruped Robot Navigation

Introduction

The paper "Quadrupped-Legged Robot Movement Plan Generation using LLM" (2512.21293) presents a practical, distributed system architecture for integrating LLM-driven high-level instruction processing into autonomous navigation of quadruped robots. Targeting the DeepRobotics Jueying Lite 3 hardware, the authors address the key challenge of the significant computational resource gap between LLM inference and the limited onboard processing available in mobile robotic platforms. The system enables users to issue navigation and movement commands in natural language, which are then parsed, grounded, and transformed into robot-executable plans through cloud-based LLMs, enhancing usability for non-expert users and promoting broader adoption in structured indoor environments.

System Architecture and Design

The proposed architecture decomposes the movement plan generation and execution pipeline across a distributed set of computational modules. The perception and motion hosts, embedded within the robot, perform low-level actuation, sensor fusion (LiDAR, IMU, odometry), and localization. High-level instruction processing—including prompt design, JSON-based plan synthesis, and interaction with remote LLM APIs (Vertex AI Gemini)—is offloaded to an external development host connected via a LAN.

A web-based user interface (Figure 1) serves as the natural language input gateway, enabling accessible control over standard platforms and simplifying the robot operation paradigm. Figure 1

Figure 1: The web-based interface provides natural language input capabilities for movement planning and execution.

The environment is semantically mapped using HDL-Localization to generate a high-fidelity SLAM environment, with Points of Interest (POIs) and waypoints explicitly defined and associated with 3D coordinates for navigation grounding (Figure 2). Figure 2

Figure 2: Visualization of spatial Points of Interest within the main hall supporting semantic goal specification.

The system prompt to the LLM is meticulously engineered to restrict plan generation to valid action primitives, incorporate contextual task constraints, and leverage few-shot examples to ensure reliable decomposition of user commands into robust, semantically correct JSON plans. These plans are consumed by a ROS-based movement executor, completing the grounding loop from intention to actuation.

Experimental Validation

The implemented system is evaluated on the 9th floor of ITS Tower 2, a structured indoor space featuring multiple functional zones, including laboratories, restrooms, pantry, and elevators. Four experimental scenarios are defined by navigation complexity and spatial extent: single-room, multi-room short and long distance, and cross-zone navigation.

Key metrics are the average task completion time and overall success rate, capturing both system efficiency and robustness. Figure 3

Figure 3: Average task completion times across scenario categories, reflecting scaling with navigational complexity.

Figure 4

Figure 4: Empirical success rates by scenario, indicating high reliability over varied task complexity.

Strong Numerical Results and Contradictory Claims

  • The aggregate success rate exceeds 90%, with several categories (single-room, cross-zone) attaining 100% over multiple (~15–20) trials.
  • Average completion times scale as expected—single room at 45.26s, cross-zone at 130.98s.
  • LLM-generated plans are consistently semantically valid, even in failure cases, affirming robust prompt and plan design.

The only notable failure modality arises in complex, multi-room scenarios, traceable not to the LLM or high-level planning but to limitations in local navigation strategies, including path revalidation and persistent localization.

Practical and Theoretical Implications

The research demonstrates that the offloading of high-level natural language instruction processing to external LLM resources does not compromise the operational autonomy of quadruped platforms, provided communication latencies are bounded and onboard navigation pipelines are efficient. This architecture is particularly advantageous for resource-constrained mobile platforms that cannot afford dedicated onboard GPUs or ASIC accelerators for LLM inference.

On the theoretical front, the work supports the emerging paradigm in HRI and robotics that LLMs, when suitably constrained and prompted, can bridge the grounding gap between abstract human intent and machine-executable action plans even in domains as physically complex as quadruped locomotion.

Future Developments

The authors propose integration of retrieval-augmented generation (RAG) for historical context accumulation, which could enable the system to maintain session-based memory and reasoning across multi-turn interactions—an essential step toward natural, contextually coherent service robots. Another extension is the integration of Vision-LLMs (VLMs) for visual grounding and situational awareness, unlocking real-time perceptual adaptation during plan execution and supporting richer interaction in unstructured or dynamic environments.

Conclusion

This research establishes a comprehensive, technically robust pipeline for LLM-mediated movement plan generation in quadruped robots, validated through high success rates and efficient operation across diverse navigation tasks. The offloading architecture democratizes high-level robot control and offers a clear pathway for further extensions in autonomous robotics, particularly as multi-modal and memory-augmented models become more deeply integrated in HRI pipelines.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

What is this paper about?

This paper shows a new, easier way to control a dog-like, four-legged robot. Instead of using complicated controls or programming, people can simply type what they want in normal language (like “Go to the pantry, then to the lab”), and an AI system turns those words into step-by-step moves the robot can follow. Because the robot’s small computer isn’t powerful enough to run the AI by itself, the heavy thinking happens on another computer (a server), while the robot handles the actual moving and sensing.

What did the researchers want to know?

  • Can regular people give natural-language instructions and have a quadruped robot understand and act correctly?
  • Can a big LLM safely turn those instructions into a clear, step-by-step plan the robot can execute?
  • If the AI “brain” runs on an external server, can the robot still move smoothly and reliably in real time?
  • How often does the system succeed in different types of tasks, from simple (within one room) to complex (across multiple areas)?

How did they do it?

The robot and its “senses”

  • The robot is a quadruped called the Jueying Lite 3.
  • It has sensors:
    • LiDAR: like a spinning laser “scanner” that measures distances to build a 3D picture of the room (think of it as the robot’s “laser eyes”).
    • IMU: like an inner ear that helps it know its tilt and movement.
    • Odometry: keeps track of how far it has walked.
  • These sensors are combined (sensor fusion) so the robot knows where it is and where obstacles are.

Teaching the robot the building

  • The team first made a detailed indoor map using SLAM (Simultaneous Localization and Mapping). You can imagine SLAM like drawing your own floor plan while walking around.
  • They marked important places on the map as “waypoints,” like “Lab Door,” “Pantry Shelf,” or “Elevator.” A waypoint is simply a named spot with coordinates on the map—like a pin dropped on a digital floor plan.

Talking to the robot

  • A simple web page (made with Flask) lets a user type instructions in everyday language (they used Indonesian in the tests).
  • An LLM (Google Vertex AI Gemini) reads the instruction and outputs a plan in JSON (a neat, computer-friendly list of steps). Example actions look like:
    • command: "goto", parameters: { waypoint: "pantry" }
    • command: "wait", parameters: { seconds: 10 }
  • The researchers designed careful prompts so the LLM:
    • Uses only allowed actions (like goto, wait).
    • Avoids making up fake places.
    • Follows examples that show how to break a long instruction into several steps.

Who does the heavy thinking?

  • The robot has two onboard computers:
    • Motion host: talks to motors and sensors.
    • Perception host: handles mapping and planning.
  • A third machine, the “development host” (off the robot), runs the LLM. Think of it like calling a smart friend for advice: the robot asks the server, “What’s the plan?”, then follows it locally.
  • This setup is called “offloading” because the AI work is moved off the robot to a stronger computer.

Turning plans into motion

  • The JSON plan from the LLM is sent to ROS (Robot Operating System), which is like the robot’s operating system for movement.
  • ROS uses the map and waypoints to plan paths and send safe, step-by-step movement commands.
  • As the robot moves, its sensors keep it on track and help avoid obstacles.

What did they find?

They tested four kinds of tasks in a real building (labs, hallways, pantry, restrooms, elevator):

  • Single-room short tasks: 100% success, average 45.26 seconds
  • Multi-room short tasks (nearby places): 96% success, average 68.27 seconds
  • Multi-room long tasks (farther apart): 90% success, average 89.71 seconds
  • Cross-zone tasks (several areas in one trip): 100% success, average 130.98 seconds

Overall, the system succeeded in more than 90% of trials. As tasks got longer and more complex, they took more time (which makes sense), but success stayed high. When failures happened, they mostly came from local navigation issues (like adjusting paths), not from the LLM’s planning. The LLM’s JSON plans were consistently correct and usable.

Why this matters:

  • It proves that natural-language control for a quadruped robot can work well in the real world.
  • Offloading the AI to an external server makes this possible even with a lightweight, cheaper robot computer.

What does this mean for the future?

  • Easier robot use: People without programming skills can guide robots in labs, offices, schools, or hospitals just by typing or speaking instructions.
  • Lower cost and weight: Robots don’t need big, power-hungry onboard GPUs because the AI can run elsewhere.
  • More capable service robots: The team plans to add:
    • RAG (Retrieval-Augmented Generation) so the robot can remember previous instructions and use them as context.
    • Vision-LLMs so the robot can understand what it sees (e.g., “That’s the pantry shelf”) and adjust on the fly.

In short, this research shows a practical way to make dog-like robots more helpful and easier to control, bringing us closer to everyday, human-friendly robot helpers.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a focused list of what remains missing, uncertain, or unexplored in the paper, articulated as concrete, actionable items for future research.

  • Generalization beyond a single, structured indoor site: Evaluate in diverse buildings, unstructured layouts, outdoor settings, and cluttered/dynamic environments.
  • Dynamic obstacle handling and closed-loop plan repair: Integrate real-time feedback to update or re-plan LLM-generated sequences when the environment changes mid-execution.
  • Expressivity of the action vocabulary: Extend beyond goto and wait to support conditional logic, loops, branching, error recovery, and task-level optimization (e.g., route reordering).
  • Legged locomotion constraints: Assess compatibility between ROS move_base-style planning and quadruped footstep/gait constraints; add terrain-aware locomotion, stairs, and narrow passage handling.
  • Cross-floor navigation realism: Move from “go to elevator” waypoints to actual elevator interaction (call, ride, exit) or stair traversal, with safety and reliability guarantees.
  • Robustness to natural language variability: Test multilingual inputs, speech-to-text pipelines, colloquial phrasing, typos, synonyms, and ambiguous commands; implement clarification dialogues.
  • Safety guarantees and formal verification: Establish formal constraints, runtime safety monitors, and certification-ready evidence that LLM plans cannot generate unsafe behaviors or hallucinated waypoints.
  • Cloud dependency and latency: Quantify end-to-end latency (LLM inference + network roundtrip) and its impact on responsiveness; evaluate offline fallback (local or edge models), caching, and degraded modes.
  • Security, access control, and adversarial inputs: Harden the Flask interface and prompt pipeline against unauthorized access, prompt injection, and malicious commands; define authentication and role-based restrictions.
  • JSON schema adherence and recovery: Provide guarantees (or detectors) for strictly valid JSON outputs; implement auto-correction, schema validation, and fallback behaviors for malformed or incomplete plans.
  • Mapping and semantic waypoint upkeep: Automate POI creation, update, and synchronization when the environment changes; measure localization drift and its effect on waypoint accuracy.
  • Baselines and comparative evaluation: Compare LLM-based interface against traditional GUIs/teleoperation, symbolic planners, or rule-based parsers on task success, efficiency, and user burden.
  • Metrics beyond success rate and average time: Report LLM inference time, network bandwidth/latency, CPU/GPU utilization, battery consumption, localization accuracy, and failure mode taxonomy.
  • Failure analysis depth: Provide root-cause analysis for multi-room failures (e.g., localization errors, costmap issues, LLM misinterpretation); quantify how each subsystem contributes to failure rates.
  • Route optimality and efficiency: Assess whether LLM plan ordering is near-optimal; integrate path cost awareness (e.g., travel time, congestion, energy) into the planner.
  • Social navigation and HRI: Evaluate behavior around people (comfort distance, etiquette, yielding), and conduct user studies on intuitiveness, trust, workload, and satisfaction.
  • Privacy and data governance: Analyze risks of sending user commands/context to cloud LLMs; define data retention, anonymization, and compliance policies.
  • Scalability and multi-user/robot coordination: Test concurrent users, command arbitration, multi-robot tasking, and scheduling under shared resources.
  • Reproducibility and transparency: Release the exact system prompt(s), code, map assets, and datasets of commands; specify model versions, parameters (e.g., temperature), and ROS planner configurations.
  • Integration with perception (VLM) and memory (RAG): Move beyond stated future work to empirically validate how visual grounding and long-horizon memory improve plan correctness and robustness.
  • Handling of no-go zones and permissions: Incorporate semantic constraints (restricted areas, time-based access, gender-specific restrooms) into planning and user feedback.
  • Robustness to network outages: Define timeouts, retries, local buffering, and safe fallback behaviors when connectivity to the LLM service is lost mid-task.
  • Task compositionality and object interaction: Explore object search, recognition, and manipulation (where feasible for the platform), including alignment between language goals and sensory evidence.

Practical Applications

Immediate Applications

Below are concrete, deployable use cases that can be implemented now with the paper’s architecture (LLM offloading, ROS navigation, 3D LiDAR SLAM, semantic waypoints, Flask web interface, cloud LLM).

  • Sector: Education and Research — Natural-language building guide on campuses
    • Use case: “Guide me to TW903, then the elevator,” with multi-room and cross-zone navigation (validated in the paper).
    • Tools/products/workflows: Semantic waypoint map of campus building; Flask web UI; LLM-to-ROS bridge publishing JSON “goto” actions; RViz-based localization; HDL-Localization for 3D maps.
    • Assumptions/dependencies: Pre-mapped indoor environment; stable Wi‑Fi for LLM calls; safe robot routes; staff oversight during initial deployment.
  • Sector: Enterprise/Office — Autonomous indoor courier and errand-running
    • Use case: Delivering items between labs/offices/pantry; “Go to lab shelf, then pantry, then elevator nearest to pantry.”
    • Tools/products/workflows: “LLM Navigation Controller” (server) + ROS move_base; semantic POI catalog (lab shelves, assembly tables, pantries); operator web UI.
    • Assumptions/dependencies: Defined routes and POIs; hallway clearance; elevator integration via human assistance or external control system; battery management scheduling.
  • Sector: Industrial Facilities — Facility inspection and routine patrols
    • Use case: Natural-language patrol sequences across rooms and corridors (e.g., restrooms, pantry, elevators).
    • Tools/products/workflows: Patrol plan templates with LLM prompt; JSON action primitives (“goto,” “wait”); ROS logs for audit; time-based schedule triggers.
    • Assumptions/dependencies: Reliable localization; updated semantic waypoints; policies for human-robot interaction in shared spaces.
  • Sector: Hospitality/Museums — Robotic docent or wayfinding assistant
    • Use case: Escort visitors across exhibits or rooms using spoken or typed requests.
    • Tools/products/workflows: Visitor mobile interface; content-linked POIs; operator override via RViz; fleet dashboard for multiple robots.
    • Assumptions/dependencies: Crowd-aware speed limits; signage and staff processes; consistent connectivity.
  • Sector: Healthcare (non-clinical logistics) — Portering within structured hospital floors
    • Use case: Moving supplies/specimens between wards and labs using natural-language tasks (pilot-level deployments).
    • Tools/products/workflows: Hospital floor POIs; governance guardrails in the LLM prompt (restricted zones, infection control).
    • Assumptions/dependencies: Compliance approvals; staff co-supervision; map accuracy; robust handoff workflow at elevators/doors.
  • Sector: Software/Robotics — Developer tooling derived from the paper
    • Use case: Packaging the LLM-to-ROS bridge and prompt templates; a “Semantic Waypoint Editor” for ROS maps; Flask UI starter kit.
    • Tools/products/workflows: Open-source modules for JSON action parsing; CI-tested prompt templates; map/POI annotation workflow; deployment scripts for development host.
    • Assumptions/dependencies: ROS ecosystem familiarity; cloud LLM API (e.g., Vertex AI Gemini); LAN/Wi‑Fi configuration.
  • Sector: Policy/Operations — Reducing training burden for robot operators
    • Use case: Natural-language control for non-experts to execute multi-step navigation plans.
    • Tools/products/workflows: Standardized command taxonomies; operator SOPs for mapping, localization, and mission start/stop; audit logs of LLM JSON actions.
    • Assumptions/dependencies: Safety gating in prompts (no hallucinated waypoints); versioned semantic maps; escalation paths for failures (e.g., multi-room revalidation).
  • Sector: Daily Life — Building errands and assistance in dorms/residential complexes
    • Use case: “Pick up package at lobby shelf, bring to pantry, then wait by elevator.”
    • Tools/products/workflows: Tenant mobile web app; POI setup in shared areas; action limits and timeout rules.
    • Assumptions/dependencies: Building access rules; shared-space etiquette; connectivity; noise and speed constraints.

Long-Term Applications

These applications require further research, scaling, or development (e.g., adding VLM/RAG, improving local navigation recovery, edge inference, multi-robot coordination, regulatory alignment).

  • Sector: Robotics/Software — Onboard (edge) LLM planning to remove cloud dependency
    • Use case: Running compressed LLMs on Jetson (e.g., Orin) for offline mission planning.
    • Tools/products/workflows: Model distillation/quantization; caching prompt results; edge inference orchestration.
    • Assumptions/dependencies: Sufficient GPU/TPU resources; acceptable latency; robust fallback behavior.
  • Sector: Robotics/Perception — Visual-LLM integration for dynamic grounding
    • Use case: “Go to the table with the red toolbox,” on-the-fly visual POI detection/captioning (as envisioned in the paper).
    • Tools/products/workflows: Camera calibration; VLM grounding to ROS frames; visual-semantic mapping updates; perception safety filters.
    • Assumptions/dependencies: Additional sensors/cameras; compute headroom; data privacy policies for visual data.
  • Sector: Enterprise/Healthcare/Hospitality — RAG-enabled personalized and context-aware missions
    • Use case: Remember user preferences, past routes, blocked areas, and schedule; adapt plans over long horizons.
    • Tools/products/workflows: Mission memory store; RAG pipelines; access controls; PII governance.
    • Assumptions/dependencies: Data retention policies; user consent; failure-aware memory updates.
  • Sector: Logistics/Operations — Multi-robot fleet coordination via natural language
    • Use case: LLM dispatcher produces coordinated JSON plans across robots (“Robot A checks pantry, Robot B patrols restrooms”).
    • Tools/products/workflows: Multi-agent ROS; conflict resolution; shared map state; task allocation algorithms.
    • Assumptions/dependencies: Reliable inter-robot comms; leader election; fleet safety and collision avoidance.
  • Sector: Public Safety/Disaster Response — Semi-structured search-and-rescue
    • Use case: Quadrupeds navigate debris-laden or partially mapped areas with natural-language directives.
    • Tools/products/workflows: Robust SLAM under degradation; satellite/mesh comms; human-in-the-loop oversight; mission triage prompts.
    • Assumptions/dependencies: Terrain-capable hardware; resilience to occlusions; emergency policies and liability frameworks.
  • Sector: Smart Buildings/Facilities — Digital twin integration for semantic maps
    • Use case: Automatic ingestion of BIM/IFC data to generate and maintain POIs and navigation constraints.
    • Tools/products/workflows: Map–twin sync services; zone permissions; dynamic route optimization (e.g., closed corridors).
    • Assumptions/dependencies: Building data access; standards alignment; change management process.
  • Sector: Standards/Policy — Natural-language command taxonomies and safety guardrails
    • Use case: Industry-wide schemas for JSON actions, restricted intents, and verification checks before execution.
    • Tools/products/workflows: Prompt governance; safety validators (e.g., waypoint existence, collision risk); audit trail standards.
    • Assumptions/dependencies: Multi-stakeholder adoption; regulatory input; interoperability across vendors.
  • Sector: Energy/Operations — Mission planning with battery-aware scheduling
    • Use case: LLM plans multi-stop routes balancing charge levels and dock availability.
    • Tools/products/workflows: Battery telemetry; charging dock mapping; time-window constraints; energy-aware planners.
    • Assumptions/dependencies: Accurate energy models; dock infrastructure; forecast of corridor traffic.
  • Sector: Localization/Mapping — Autonomous semantic map building and continual refinement
    • Use case: Robots learn and label new POIs from operator dialogue (“Mark this table as ‘assembly table’”).
    • Tools/products/workflows: Interactive map editors with language input; validation protocols; versioned map releases.
    • Assumptions/dependencies: Operator training; anti-hallucination checks; rollback mechanisms.
  • Sector: Global Operations — Cross-lingual natural-language control
    • Use case: Multi-language support beyond Indonesian for multicultural deployments.
    • Tools/products/workflows: Language-specific prompt packs; POI name localization; tokenizer/encoding evaluation.
    • Assumptions/dependencies: LLM language coverage quality; cultural and signage differences; translation consistency.
  • Sector: Commercialization — LLM Navigation Server (SaaS) for robotics vendors
    • Use case: Hosted planning service that converts natural-language tasks to ROS action plans for diverse robot hardware (beyond Jueying Lite 3).
    • Tools/products/workflows: Hardware-agnostic adapters; API SLAs; customer onboarding (mapping/POI setup).
    • Assumptions/dependencies: Vendor integrations; pricing viability; security and uptime guarantees.

Glossary

  • Action Primitives: Basic atomic robot behaviors used to structure LLM outputs. "Action Primitives: A defined list of valid robot behaviors (navigation, exploration, halting)."
  • Contextual Constraints: Rules that prevent unsafe or hallucinated outputs from the LLM. "Contextual Constraints: Rules preventing the generation of hallucinated or unsafe waypoints."
  • DeepRobotics Jueying Lite 3: A commercially available quadruped robot platform used in the experiments. "DeepRobotics Jueying Lite 3 platform"
  • Development host: The computing unit that handles LLM requests and hosts the web interface. "the development host is responsible for handling LLM requests."
  • Distributed architecture: A system design where computation is spread across multiple machines, with heavy tasks offloaded. "We propose a distributed architecture where high-level instruction processing is offloaded to an external server"
  • Flask-based web server: A Python web server (Flask) providing the robot’s natural language interface. "Flask-based web server hosted by the robot."
  • Global planning algorithms: Path-planning methods that operate over the entire map to generate routes. "global planning algorithms"
  • Grounding: Linking language instructions to executable actions and environmental affordances. "capable of "grounding" abstract instructions into actionable robotic sequences"
  • HDL-Localization: A 3D LiDAR-based SLAM algorithm for mapping and localization. "We utilize HDL-Localization, a 3D LiDAR-based SLAM technique"
  • IMU: Inertial Measurement Unit; a sensor for acceleration and rotation used in localization. "IMU sensor"
  • LiDAR: Light Detection and Ranging; a laser-based distance sensor for mapping and navigation. "particularly the LiDAR and IMU"
  • LLM API: A cloud-hosted interface to call a LLM. "cloud-hosted LLM API"
  • LLM inference: The runtime computation performed by an LLM to produce outputs. "offloads the computationally intensive LLM inference to an external server"
  • LLMs: Foundation models that process natural language and act as high-level planners. "LLMs to enable intuitive, natural language-based navigation."
  • Map frame: The global coordinate frame of the map used by ROS for navigation. "associated with a global coordinate x,y,zx,y,z in the map frame"
  • Motion host: The on-robot computer interfacing with actuators and low-level sensors. "The motion host is responsible as the main connection to the robot motion actuator and sensor fleet"
  • move_base: A ROS navigation component/topic for sending goals and executing paths. "ROS move_base topic"
  • NVIDIA Jetson Xavier NX: An embedded GPU computing module used for real-time perception tasks. "The perception host utilizes an NVIDIA Jetson Xavier NX"
  • Odometry: Estimation of robot motion over time from sensor data. "(LiDAR, IMU, and Odometry)"
  • Perception host: The on-robot computer handling localization, sensor fusion, and planning. "All the sensor fusion and processing used for localization and planning are running inside the perception host."
  • Points of Interest (POIs): Semantically labeled locations used as navigation targets. "points of interest POIsPOIs such as laboratories, pantries, and elevators"
  • ROS: Robot Operating System; middleware for robot software integration and communication. "ROS-compatible SLAM tools"
  • ROS Master: The central ROS process that coordinates nodes and topics. "Once the sensor data is available to the ROS Master"
  • ROS navigation stack: A set of ROS packages for mapping, planning, and control. "the user launches the ROS navigation stack"
  • ROS topics: Publish/subscribe communication channels in ROS. "via ROS topics"
  • RViz: A ROS visualization tool for maps, localization, and sensor data. "using RViz"
  • SayCan: A framework connecting language to robot actions based on affordances. ""SayCan" framework"
  • Semantic Waypoints: Labeled map points representing meaningful locations for navigation. "semantic waypoints representing key points of interest POIsPOIs"
  • Sensor fusion: Combining multiple sensor modalities to improve state estimation. "real-time sensor fusion (LiDAR, IMU, and Odometry)"
  • SLAM: Simultaneous Localization and Mapping; building a map while localizing within it. "SLAM technique"
  • System prompt: The initial instruction that configures the LLM’s role and output format. "We employ a carefully engineered system prompt that instructs the LLM (Vertex AI Gemini) to function as a motion planner."
  • Vertex AI Gemini: Google’s LLM platform used for motion planning. "Vertex AI Gemini"
  • Visual LLM: A model that integrates vision and language for perception-informed planning. "Visual LLM"

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.