- The paper introduces PRISM, a framework for distilling large language models (LLMs) into small language models (SLMs) to enable on-device robot planning with minimal human intervention.
- PRISM-distilled SLMs achieve over 93% of GPT-4o's planning success rate across diverse robot domains and environments while running efficiently (under 5GB memory, real-time latency) on standard robot hardware.
- The method automates the entire process via synthetic data generation from an LLM, offering scalability and robustness to network failures compared to cloud-dependent LLMs.
Distilling On-device LLMs for Robot Planning with Minimal Human Intervention
This paper introduces PRISM, a framework for distilling small LLMs (SLMs) for robot planning, enabling on-device execution with minimal human supervision. The motivation stems from the computational and infrastructural limitations of deploying LLMs such as GPT-4o on physical robots, particularly in environments with unreliable or unavailable network connectivity. PRISM addresses this by automating the synthesis of training data and distillation of SLMs that can serve as drop-in replacements for LLMs in robot planning pipelines.
Methodology
PRISM operates in three stages: scenario generation, plan elicitation, and planner distillation.
- Scenario Generation: Given the action and observation spaces of an LLM-enabled planner, PRISM uses an LLM to synthesize diverse tasks and textual environment representations (e.g., scene graphs, object sets). The generator is prompted to ensure semantic coherence and adherence to the planner’s input format. This process is fully automated and does not require manual dataset curation or simulators.
- Plan Elicitation: For each synthesized scenario, PRISM interacts with the source LLM-enabled planner to elicit plans. It masks parts of the environment to simulate partial observability, then iteratively queries the planner in a closed-loop fashion, updating observations based on the planner’s actions. This yields a dataset of task-observation-action sequences, with plan validation to filter out invalid or incomplete rollouts.
- Planner Distillation: The collected dataset is used to fine-tune a target SLM via supervised fine-tuning (SFT), minimizing cross-entropy over action predictions. The process leverages parameter-efficient techniques such as LoRA to reduce memory footprint and training cost, making it feasible to distill models with a memory footprint under 5GB.
Experimental Evaluation
PRISM is evaluated on three LLM-enabled planners across distinct domains:
- SPINE: Language-driven navigation, mapping, and exploration on both ground (UGV) and aerial (UAV) robots in indoor and outdoor environments.
- SayCan: Hierarchical manipulation tasks in a simulated tabletop environment.
- LLM-Planner: Household assistance tasks in the ALFRED simulator, requiring multi-step object manipulation and navigation.
The evaluation compares three configurations: the original LLM-enabled planner (GPT-4o), the same planner with an undistilled SLM (Llama-3.2-3B), and the planner with a PRISM-distilled SLM. The primary metric is planning success rate.
Key Results
- Performance: PRISM-distilled SLMs achieve over 93% of the planning success rate of GPT-4o across all three domains, a substantial improvement over undistilled SLMs, which achieve only 10–20% of LLM performance.
- Efficiency: The distilled SLMs run in real-time on robot hardware, with latency within 200ms of GPT-4o under ideal network conditions and over 1s faster under realistic (high-latency) conditions. This enables deterministic, network-independent planning.
- Generalization: The distilled planners generalize across heterogeneous robotic platforms and diverse environments, including both indoor and outdoor settings.
- Ablation: Removing environment masking or plan validation from the data synthesis pipeline significantly degrades performance, highlighting the importance of interactive, validated data for effective distillation.
Implementation Details
- Data Synthesis: All training data is generated synthetically via LLM prompting, requiring only a high-level configuration of the planner’s action and observation spaces.
- Fine-tuning: SLMs are fine-tuned using the Unsloth library (built on Huggingface Transformers and PyTorch) with LoRA for parameter-efficient adaptation. Hyperparameters are tuned per domain, with typical settings including 5 epochs, learning rates in the range 1e-4 to 2e-4, and LoRA ranks between 16 and 32.
- Deployment: The resulting SLMs are deployed on standard robot compute platforms (e.g., Nvidia Jetson Orin NX, RTX 4000), with memory footprints under 5GB, enabling on-device inference without reliance on cloud infrastructure.
Implications and Discussion
PRISM demonstrates that high-quality, on-device LLM planners can be distilled from LLMs using only synthetic data, without manual annotation or simulation. This has several practical implications:
- Scalability: The approach is scalable to new domains and robot platforms, as it requires minimal human input and no real-world data collection.
- Robustness: On-device planners are robust to network failures and can be deployed in unstructured or remote environments.
- Reproducibility: The release of code, models, and datasets facilitates reproducibility and further research.
However, the method inherits certain limitations from the underlying LLMs. The distilled SLMs are constrained by the expressivity of the action and observation spaces and may struggle with tasks requiring complex spatial reasoning or formal action representations (e.g., code generation). Additionally, safety vulnerabilities present in LLM-enabled planners (e.g., unsafe action generation) are likely to persist in the distilled models, necessitating further research into integrated safety mechanisms.
Future Directions
Potential avenues for future work include:
- Improved Data Synthesis: Enhancing the quality and diversity of synthetic scenarios, particularly for tasks requiring advanced spatial or temporal reasoning.
- Safety Integration: Incorporating safety objectives or constraints directly into the distillation process, possibly via reinforcement learning or constitutional AI techniques.
- Broader Action Spaces: Extending the framework to planners with more complex or formal action representations, such as code or temporal logic.
Conclusion
PRISM provides a practical and effective solution for deploying LLM-based robot planners on-device, overcoming the computational and infrastructural barriers of LLMs. By automating data synthesis and distillation, it enables the creation of efficient, high-performing SLMs that can be readily integrated into existing robotic systems, broadening the applicability of language-driven robot planning in real-world settings.