- The paper introduces a novel distillation process that enforces self-consistency in pretrained diffusion policies to reduce inference steps.
- The methodology uses a teacher-student framework with EDM and CTM to achieve one- and three-step inference, significantly boosting speed.
- Experiments in simulation and real-world tasks demonstrate competitive success rates, with improvements such as 100% success in lift tasks and reduced computational latency.
Accelerated Visuomotor Policies via Consistency Distillation
Introduction to Consistency Policy
Robotic systems, whether mobile or stationary, often can't afford the luxury of high-end GPUs due to space, weight, and power constraints. This limitation creates a roadblock for leveraging advanced visuomotor policy architectures that need extensive computational resources for fast policy inference. Enter Consistency Policy, a new approach designed to provide a faster yet competitively performing alternative to traditional Diffusion Policies.
The main idea behind Consistency Policy is to distill a pretrained Diffusion Policy into a more efficient model by enforcing self-consistency along the learned trajectories. This allows the distilled model to make decisions much quicker and with less computational power.
How It Works
Diffusion Models: A Quick Primer
Before diving into Consistency Policy, let's get a handle on diffusion models, which have shown impressive results in imitation learning for robotic control. In essence, they start with a noisy initial state and sequentially denoise it to produce the desired action. This typically requires multiple steps and significant computational power.
The Need for Speed
Diffusion models are effective but slow. For example, Diffusion Policies using Denoising Diffusion Probabilistic Models (DDPM) perform multiple forward evaluations, taking around one second per action generation on an NVIDIA T4 GPU. This latency is impractical for robots that need quick decision-making capabilities, such as dynamic object manipulation or agile navigation.
Enter Consistency Policy
Training Process
- Teacher Model (EDM Framework): The first step involves training a teacher model using an efficient diffusion framework called EDM. This model learns to predict actions by progressively denoising a sequence of noisy inputs.
- Consistency Trajectory Model (CTM): Next, a student model is distilled from the teacher by enforcing self-consistency along the learned trajectories. The idea is to train the student model to generate the same predicted actions when given different points on the same trajectory. This drastically reduces the number of steps needed for predictions.
Inference Speed
Consistency Policy offers two primary modes of inference:
- Single-Step Inference: For ultra-low latency requirements, the model can predict actions in just one step.
- Three-Step Inference: This trades off a bit of speed for higher accuracy by chaining actions over three steps.
Strong Numerical Results
The evaluation of Consistency Policy covered six simulation tasks and two real-world tasks, showing significant improvements in inference speed without a meaningful drop in performance.
Simulation Tasks
- Robomimic Tasks (Lift, Can, Square, Tool Hang): The results demonstrated that Consistency Policy could achieve success rates comparable to DDPM and DDiM but in much less time. For instance:
- Lift: 100% success rate with 1-step inference, maintaining parity with the best-performing methods.
- Can: Surpassing DDiM with a 98% success rate in 1-step inference.
- Franka Kitchen & Push-T Tasks: Here, single-step Consistency Policy performed commendably, showing that it can handle both long-horizon and multi-stage tasks efficiently.
Real-World Applications
- Trash Clean Up Task: Consistency Policy maintained an 80% success rate, significantly speeding up the inference time (21 ms compared to 192 ms for DDiM).
- Plug Insertion Task: Similar trends were observed, highlighting the method’s robustness even in more intricate, contact-rich tasks.
Implications and Future Developments
Practical Implications: Consistency Policy opens the door for employing advanced visuomotor policies in resource-constrained environments. This makes high-level robot control feasible on devices with limited computational capabilities.
Theoretical Implications: The robust performance across varying teacher model qualities indicates that extensive fine-tuning of pretrained models might be unnecessary, simplifying the setup process.
Future Directions: The success of Consistency Policy suggests several avenues for further research:
- Multimodality Improvements: Adding more complex sampling schemes to reintroduce the lost multimodal behaviors in distilled models.
- General Applicability: Extending the approach to other forms of robot policies and exploring its efficacy with other architectures like transformers.
Conclusion
In summary, Consistency Policy represents a leap forward for practical, efficient robot control, achieving substantial gains in inference speed while retaining competitive success rates across a variety of robotic tasks. Its balance of speed and performance makes it a promising tool for the broader application of AI in robotics.