Learning What to Say to Your VLA: Mostly Harmless Vision Language Action Model Steering

Published 10 Jun 2026 in cs.RO and cs.LG | (2606.12299v1)

Abstract: Vision-Language-Action (VLA) models provide a natural language interface to robot control, but the mapping from language to behavior is often brittle and unintuitive: semantically similar instructions can induce drastically different behaviors, while some capabilities may not be elicitable through prompting alone. As a result, both human instructions and zero-shot LLMs can fail to reliably steer VLAs toward successful task execution. In this work, we propose a framework that interactively searches for language sequences that improve closed-loop VLA task performance, distills these sequences into a test-time language feedback policy (LFP), and learns an improvement head that predicts when language steering will improve performance. We conformalize this improvement head to prevent harmful steering interventions, where the LFP decreases task performance relative to the original instruction on out-of-distribution scenarios. Crucially, our approach operates on arbitrary frozen pre-trained VLAs, requiring neither access to the original training distribution nor fine-tuning of the underlying model. On seen environments, our conformalized LFP improves base VLA performance by 24.7% in simulation and 65.0% in hardware. On visual and semantic perturbations, our conformalized LFP has strong harmlessness guarantees, and produces recovery behaviors not observed with open-loop prompting.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper presents an interactive language search that distills effective language instructions to robustly steer VLA models and boost task success.
It leverages narrated fine-tuning and Monte Carlo evaluations, achieving a 24.7% to 65.0% improvement in performance across simulation and hardware experiments.
The method employs a conformalized policy with harmlessness guarantees, significantly reducing harmful interventions during out-of-distribution shifts.

Interactive Language Steering for Vision-Language-Action Models

Problem Context and Objectives

Vision-Language-Action (VLA) models offer a unified, language-driven interface for embodied manipulation, but their test-time language-to-action mapping often exhibits brittleness and is highly unintuitive. Even semantically similar instructions can yield disparate behaviors, and some skills encoded in the underlying foundation model are inexpressible through conventional prompting. Consequently, both naïve human instructions and zero-shot LMs fail to robustly elicit the full behavioral repertoire of VLAs. The paper "Learning What to Say to Your VLA: Mostly Harmless Vision Language Action Model Steering" (2606.12299) directly addresses this shortcoming by proposing a framework to interactively search for language sequences that enhance closed-loop task performance, distill these sequences into a deployable Language Feedback Policy (LFP), and learn a calibrated improvement head to predict when language steering is genuinely beneficial.

Methodology

The core contributions are organized into three interlocked phases: narrated fine-tuning, interactive language search, and conformalized policy deployment.

Narrated Fine-Tuning

Leveraging vision-LLMs (VLMs) for fine-grained narration, the framework converts observation-only demonstrations into temporally aligned text descriptions of robot behavior in context, providing a strong, structured prior for efficient search over the combinatorially vast language space. This circumvents the immediate intractability posed by open-vocabulary language steering.

Interactive Language Search

A local search is performed within the neighborhood of narrated sequences generated by the initial VLM. Using LLMs (such as GPT-5.4), the method generates trajectory-level semantic perturbations of narrated action sequences, which are then validated via in-situ rollouts with the frozen VLA. Monte Carlo estimates of improvement are computed for each candidate, and sequences empirically outperforming the baseline instruction are distilled into the LFP via rejection fine-tuning.

Conformalized Steering via Improvement Head

Recognizing that language steering may be harmful under OOD shifts (semantics or visuals not covered in training), an improvement head is trained to predict the expected gain from steering. To guarantee robust deployment, the head is calibrated with class-conditional conformal prediction, bounding the false positive rate of harmful interventions under OOD to a specified threshold. The system executes the LFP only when the predicted improvement exceeds this calibrated threshold; otherwise, it falls back to the base instruction, ensuring a "do no harm" property.

Empirical Evaluation

Experimental Domains

Experiments were conducted in both simulation (LIBERO-OOD manipulation suite) and on physical hardware (Franka Emika manipulator), across a wide range of visual and task-instruction perturbations including cross-domain transfer and compositional generalization settings.

Main Findings

Intra- and Out-of-Distribution Robustness: The conformalized LFP increases closed-loop VLA success rate by 24.7% in simulation and 65.0% in hardware relative to the base VLA, with strong harmlessness guarantees on unseen perturbations.
Sample Efficiency: Language steering achieves the same or better task success as direct action fine-tuning with as little as 20% of the fine-tuning data, demonstrating more efficient utilization of limited demonstration budgets.
Closed-Loop Superiority: Trajectory-level closed-loop language feedback significantly outperforms open-loop (static) prompt rewriting (75.0% vs. ~71% mean success in simulation under distribution shift), driven by the ability to adaptively inject language as the scene evolves and recover from perturbations in real-time.
Harmlessness Guarantees: Conformal calibration reduced the empirical false positive rate for harmful interventions from 38.9% to 9.3% in simulation, and from 61.1% to 2.2% in hardware, closely matching the specified deployment target.
Generalization: The method enhances compositional generalization to unseen combinations of behaviors and robustly transfers to novel tasks provided the underlying VLA contains the requisite motor primitives.

Theoretical and Practical Implications

This framework establishes a practical protocol for steering frozen VLAs via language—without access to model weights, labels, or the original training distribution. This abstraction, operating at the interface between high-level task specification and policy execution, enables:

Safe Deployment: Model interventions occur only when reliably predicted to be beneficial, critical for embodied deployment where physical errors are costly.
Data-Efficient Adaptation: The demonstrated efficiency in leveraging observational data and interaction rollouts suggests a promising path for low-resource adaptation and generalization.
Separation of Concerns: By learning when to steer as well as what to say, the system decouples policy improvement from risk management, providing a template for integrating other modalities of intervention (e.g., latent/action steering, observation editing).

Limitations and Future Directions

Despite the guarantees, relying solely on language steering limits expressivity, especially when the VLA is fundamentally unsteerable for certain goals. The language search is restricted to local perturbations, so global optima in the prompt space can be missed. Expansion of the search space, integration of other forms of intervention (e.g., observation injection (Hancock et al., 2024), latent editing (Häon et al., 30 Aug 2025)), and investigation of hybrid control hierarchies are necessary next steps. Additionally, inference latency due to autoregressive language generation constitutes a nontrivial bottleneck at scale, motivating future work in lightweight policy distillation and asynchronous update architectures.

Conclusion

The presented approach provides a sample- and safety-efficient solution for test-time steering of frozen VLAs using closed-loop, interactively learned language feedback, with empirical and statistical safeguards against harmful interventions. By formalizing the language steering problem as an MDP and integrating a conformalized improvement predictor, the framework offers robust generalization and recovery capabilities under distribution shift. This addresses a key challenge in practical VLA deployment and motivates broader exploration of high-level interface learning and confidence-calibrated control in embodied AI (2606.12299).