Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Minimalist Prompt for Zero-Shot Policy Learning (2405.06063v1)

Published 9 May 2024 in cs.LG

Abstract: Transformer-based methods have exhibited significant generalization ability when prompted with target-domain demonstrations or example solutions during inference. Although demonstrations, as a way of task specification, can capture rich information that may be hard to specify by language, it remains unclear what information is extracted from the demonstrations to help generalization. Moreover, assuming access to demonstrations of an unseen task is impractical or unreasonable in many real-world scenarios, especially in robotics applications. These questions motivate us to explore what the minimally sufficient prompt could be to elicit the same level of generalization ability as the demonstrations. We study this problem in the contextural RL setting which allows for quantitative measurement of generalization and is commonly adopted by meta-RL and multi-task RL benchmarks. In this setting, the training and test Markov Decision Processes (MDPs) only differ in certain properties, which we refer to as task parameters. We show that conditioning a decision transformer on these task parameters alone can enable zero-shot generalization on par with or better than its demonstration-conditioned counterpart. This suggests that task parameters are essential for the generalization and DT models are trying to recover it from the demonstration prompt. To extract the remaining generalizable information from the supervision, we introduce an additional learnable prompt which is demonstrated to further boost zero-shot generalization across a range of robotic control, manipulation, and navigation benchmark tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
  2. Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916, 2022.
  3. Rt-2: Vision-language-action models transfer web knowledge to robotic control, 2023.
  4. Reasoning with language model is planning with world model, 2023.
  5. A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
  6. Zero-shot robot manipulation from passive human videos. arXiv preprint arXiv:2302.02011, 2023.
  7. Zero-shot visual imitation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 2050–2053, 2018.
  8. Prompting decision transformer for few-shot policy generalization. In International Conference on Machine Learning, pages 24631–24645. PMLR, 2022.
  9. Hyper-decision transformer for efficient online policy adaptation, 2023.
  10. Transformers for one-shot visual imitation. In Conference on Robot Learning, pages 2071–2084. PMLR, 2021.
  11. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 5331–5340. PMLR, 09–15 Jun 2019.
  12. Darla: Improving zero-shot transfer in reinforcement learning. In International Conference on Machine Learning, pages 1480–1490. PMLR, 2017.
  13. Offline meta-reinforcement learning with advantage weighting. In International Conference on Machine Learning, pages 7780–7791. PMLR, 2021.
  14. One-shot imitation learning. Advances in neural information processing systems, 30, 2017.
  15. Rethinking the role of demonstrations: What makes in-context learning work?, 2022.
  16. Bc-z: Zero-shot task generalization with robotic imitation learning. In Aleksandra Faust, David Hsu, and Gerhard Neumann, editors, Proceedings of the 5th Conference on Robot Learning, volume 164 of Proceedings of Machine Learning Research, pages 991–1002. PMLR, 08–11 Nov 2022. URL https://proceedings.mlr.press/v164/jang22a.html.
  17. Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10740–10749, 2020.
  18. End-to-end driving via conditional imitation learning. In 2018 IEEE international conference on robotics and automation (ICRA), pages 4693–4700. IEEE, 2018.
  19. Programmatically grounded, compositionally generalizable robotic manipulation. arXiv preprint arXiv:2304.13826, 2023.
  20. Universal value function approximators. In International conference on machine learning, pages 1312–1320. PMLR, 2015.
  21. Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
  22. Learning to reach goals via iterated supervised learning. arXiv preprint arXiv:1912.06088, 2019.
  23. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  24. Learning to generalize across long-horizon tasks from human demonstrations. arXiv preprint arXiv:2003.06085, 2020.
  25. Goal-conditioned imitation learning. Advances in neural information processing systems, 32, 2019.
  26. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  27. Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
  28. Rvs: What is essential for offline rl via supervised learning? arXiv preprint arXiv:2112.10751, 2021.
  29. Juergen Schmidhuber. Reinforcement learning upside down: Don’t predict rewards–just map them to actions. arXiv preprint arXiv:1912.02875, 2019.
  30. Reward-conditioned policies. arXiv preprint arXiv:1912.13465, 2019.
  31. Multi-game decision transformers. Advances in Neural Information Processing Systems, 35:27921–27936, 2022.
  32. Generalized decision transformer for offline hindsight information matching. arXiv preprint arXiv:2111.10364, 2021.
  33. Online decision transformer. In International Conference on Machine Learning, pages 27042–27059. PMLR, 2022.
  34. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
  35. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.353. URL https://aclanthology.org/2021.acl-long.353.
  36. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  37. Contextual markov decision processes. arXiv preprint arXiv:1502.02259, 2015.
  38. Deep reinforcement learning amidst lifelong non-stationarity. arXiv preprint arXiv:2006.10701, 2020.
  39. Learning cross-domain correspondence for control with dynamics cycle-consistency. arXiv preprint arXiv:2012.09811, 2020.
  40. Hierarchically decoupled imitation for morphological transfer. In International Conference on Machine Learning, pages 4159–4171. PMLR, 2020.
  41. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  42. Improving language understanding by generative pre-training. OpenAI preprint, 2018.
  43. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pages 1094–1100. PMLR, 2020.
  44. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  45. When should we prefer offline reinforcement learning over behavioral cloning? arXiv preprint arXiv:2204.05618, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Meng Song (9 papers)
  2. Xuezhi Wang (64 papers)
  3. Tanay Biradar (1 paper)
  4. Yao Qin (41 papers)
  5. Manmohan Chandraker (108 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.