Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots

Published 3 Apr 2026 in cs.RO | (2604.02911v1)

Abstract: Achieving quadruped robot locomotion across diverse and dynamic terrains presents significant challenges, primarily due to the discrepancies between simulation environments and real-world conditions. Traditional sim-to-real transfer methods often rely on manual feature design or costly real-world fine-tuning. To address these limitations, this paper proposes the DreamTIP framework, which incorporates Task-Invariant Properties learning within the Dreamer world model architecture to enhance sim-to-real transfer capabilities. Guided by LLMs, DreamTIP identifies and leverages Task-Invariant Properties, such as contact stability and terrain clearance, which exhibit robustness to dynamic variations and strong transferability across tasks. These properties are integrated into the world model as auxiliary prediction targets, enabling the policy to learn representations that are insensitive to underlying dynamic changes. Furthermore, an efficient adaptation strategy is designed, employing a mixed replay buffer and regularization constraints to rapidly calibrate to real-world dynamics while effectively mitigating representation collapse and catastrophic forgetting. Extensive experiments on complex terrains, including Stair, Climb, Tilt, and Crawl, demonstrate that DreamTIP significantly outperforms state-of-the-art baselines in both simulated and real-world environments. Our method achieves an average performance improvement of 28.1% across eight distinct simulated transfer tasks. In the real-world Climb task, the baseline method achieved only a 10\ success rate, whereas our method attained a 100% success rate. These results indicate that incorporating Task-Invariant Properties into Dreamer learning offers a novel solution for achieving robust and transferable robot locomotion.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper introduces DreamTIP, which uses LLM-driven TIP extraction to robustly transfer quadruped control policies from simulation to real-world scenarios.
The methodology combines simulation pretraining with efficient real-world adaptation via mixed replay buffers, reference model regularization, and selective parameter updates.
Empirical results demonstrate a 28.1% average improvement in transfer tasks and a 100% success rate on challenging real-world obstacles, underscoring its data efficiency and robustness.

Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots

Introduction

The paper "Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots" (2604.02911) addresses the core challenge of efficient sim-to-real transfer for quadruped locomotion, which is significantly impeded by the divergence between simulated and physical environments. Traditional approaches, including domain randomization, simulator fidelity improvement, and domain adaptation, have notable limitations: coverage gaps in randomization, high cost of simulation fidelity, and instability or high overhead in adaptation methods. The study posits that one cause of poor policy transfer is the model's over-dependence on simulation-specific dynamic parameters as opposed to fundamental, task-invariant properties.

DreamTIP Framework

The DreamTIP (Dreamer with Task-Invariant Properties) framework extends the Dreamer world model by explicitly extracting and leveraging Task-Invariant Properties (TIPs) (Figure 1). TIPs such as contact stability and terrain clearance are abstracted by LLM-driven analysis of task descriptions and privileged state observations. Concretely, an LLM generates a TIP extractor function, which maps privileged robot observations to a reduced property space designed to be robust across variations in environment dynamics.

Figure 1: The DreamTIP paradigm augments Dreamer by incorporating TIPs, decreasing reliance on environment-specific dynamics.

This architecture consists of two principal stages (Figure 2): (1) simulation-phase pretraining, wherein DreamTIP learns to predict both standard observations and TIPs, thereby structuring its latent space around transferable priors; (2) deployment-phase adaptation, wherein the model is calibrated to real robot dynamics using a small number of physical rollouts with a mixed replay buffer and regularization mechanisms to counteract representation collapse and catastrophic forgetting.

Figure 2: The DreamTIP framework: simulation-based TIP learning and efficient adaptation to real dynamics with limited rollouts.

Task-Invariant Properties and LLM Integration

A central contribution is the design and utilization of the TIP extractor, automated via LLM reasoning. The pretraining phase presents both high-level task descriptions and full privileged observation spaces to the LLM, which returns a property-extraction function. This function generates, at each timestep, a property vector summarizing robust behavioral constraints—e.g., ensuring sufficient terrain clearance and maintaining foot contact stability across locomotion tasks.

The world model is then trained with auxiliary prediction objectives: for each line of experience, DreamTIP's latent state must not only reconstruct the observation but also predict the TIP vector, thereby regularizing the learned dynamics representation to be insensitive to irrelevant domain-specific variations (Figure 2).

Efficient Real-World Adaptation

Upon deployment, efficient sim-to-real adaptation is critical due to data scarcity and the risk of overfitting/forgetting. DreamTIP approaches this via:

Mixed replay buffer: Real-world rollouts are merged with simulated transitions, ensuring the gradient updates preserve both pre-trained knowledge and adaptation to real dynamics.
Reference model regularization: A frozen copy of the pre-trained world model serves as a reference; adaptation updates are regularized with a negative cosine similarity alignment between current and reference latent features, constraining representational drift.
Selective parameter update: The recurrent module in DreamTIP is frozen during adaptation, focusing updates on components most relevant to encoding new dynamics without destabilizing temporal structure.

These mechanisms converge to a robust adaptation paradigm that minimizes the sample complexity and enhances sim-to-real reliability.

Empirical Evaluation

Extensive experimental validation was conducted on both simulated (Isaac Gym-based) and real-world (Unitree Go2) platforms, across a challenging suite of eight terrain-influenced transfer tasks, including stair, gap, climb, crawl, and tilt (Figure 3).

Figure 3: Representative terrain configurations in both simulation and real-world tests.

Figure 4 demonstrates that DreamTIP achieves a 28.1% average improvement across all simulated transfer tasks, persisting in performance where baselines such as WMP and DreamTIP-DWL display rapid degradation with increasing task difficulty. Notably, in the hardest crawl scenario, DreamTIP attains a reward of 25.35 versus the baseline's collapse to 5.66, corresponding to over 80% difference at high difficulty.

Figure 4: DreamTIP outperforms all baselines on 8 sim-to-real transfer tasks, sustaining reward as difficulty increases.

Real-world deployments further corroborate transfer robustness. On the physical Climb task with a 52 cm obstacle, DreamTIP achieved 100% success rate, compared to just 10% for the primary baseline and 90% for the ablated version lacking sim-to-real adaptation. Other tasks (Stair, Tilt, Crawl) similarly show DreamTIP's high resilience.

More qualitatively, Figure 5 exhibits that DreamTIP enables the robot to traverse a 25 cm crawl obstacle without collision, whereas the baseline fails, causing head contact with the obstacle in both sim and real settings.

Figure 5: DreamTIP enables safe real-world traversal in crawl tasks where the baseline policy leads to collisions.

An ablation on the volume of real-world adaptation data (Figure 6) demonstrates diminishing returns beyond 5 adaptation trajectories, supporting DreamTIP’s claim of efficiency. Additionally, comparison of various LLM-based (GPT-5, DeepSeekV3) versus classical TIP formulations substantiates the generalizability gains from sophisticated, language-informed property extraction.

Figure 6: Task success as a function of adaptation sample count reveals rapid performance saturation, underscoring adaptation efficiency.

Implications and Future Directions

DreamTIP fundamentally shifts world model design toward explicit, transferable regularization via property abstraction, supported by LLM knowledge. The approach maintains strong sim-to-real policy performance under challenging, high-variance terrain and dynamic shifts, with moderate adaptation data requirements. Practically, this bodes well for the deployment of agile quadruped robots in real-world tasks, where data is limited and environments are unpredictable.

Theoretically, the results support the broader vision of compositional world models—the synergy of abstract property prediction, language-driven insight, and efficient, regularized adaptation, setting a new standard for policy invariance in robotic control. However, the approach is still susceptible to accumulated error from world model imperfections during extended operations.

Potential future work includes:

Leveraging richer and more diverse simulation and real datasets for long-term error mitigation and further robustness.
Exploring compositional TIP architectures or hierarchical LLM-extracted priors.
Investigating the application of DreamTIP in other high-variance robotics domains (e.g., manipulation, aerial vehicles).

Conclusion

The study presents DreamTIP, an LLM-augmented Dreamer extension that incorporates explicit learning of task-invariant properties to robustly bridge sim-to-real gaps in quadrupedal locomotion (2604.02911). The results establish significant advances in transfer performance, robustness, and data efficiency, validated in challenging settings. DreamTIP exemplifies a framework for using language-driven property extraction and regularization-based adaptation as scalable tools for adaptive robotic policy learning.

Markdown Report Issue