PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control (2402.10450v3)
Abstract: Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making. In this work, we propose a novel view that treats inducing temporal action abstractions as a sequence compression problem. To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains. We introduce an approach called Primitive Sequence Encoding (PRISE) that combines continuous action quantization with BPE to learn powerful action abstractions. We empirically show that high-level skills discovered by PRISE from a multitask set of robotic manipulation demonstrations significantly boost the performance of both multitask imitation learning as well as few-shot imitation learning on unseen tasks. Our code is released at https://github.com/FrankZheng2022/PRISE.
- OPAL: Offline primitive discovery for accelerating offline reinforcement learning. In ICLR, 2021. URL https://openreview.net/forum?id=V69LGwJ0lIN.
- Why exposure bias matters: An imitation learning perspective of error accumulation in language generation. In Findings of ACL, 2022.
- Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems: Theory and Applications, 13:41–77, 2003.
- Language models are few-shot learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
- Learning action representations for reinforcement learning. In International conference on machine learning, pp. 941–950. PMLR, 2019.
- Decision transformer: Reinforcement learning via sequence modeling. In NeurIPS, 2021.
- Open X-Embodiment: Robotic learning datasets and RT-X models. https://arxiv.org/abs/2310.08864, 2023.
- Representation learning with contrastive predictive coding, 2019.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
- Gage, P. A new algorithm for data compression. C Users Journal, 12(2):23––38, 1994.
- Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In CoRL, 2019. URL https://proceedings.mlr.press/v100/gupta20a.html.
- Learning an embedding space for transferable robot skills. In ICLR, 2018.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016. doi: 10.1109/CVPR.2016.90.
- The dependence of effective planning horizon on model accuracy. In AAMAS, 2015.
- Efficient planning in a compact latent action space. In ICLR, 2023. URL https://openreview.net/forum?id=cA77NrVEuqn.
- Auto-encoding variational Bayes. In NIPS, 2014.
- Compile: Compositional imitation learning and execution. In ICML, 2019.
- DDCO: Discovery of deep continuous options for robot learning from demonstrations. In CoRL, 2017.
- Kudo, T. Subword regularization: Improving neural network translation models with multiple subword candidates. arXiv preprint arXiv:1804.10959, 2018.
- Hierarchical imitation learning with vector quantized models. In ICML, 2023. URL https://proceedings.mlr.press/v202/kujanpaa23a.html.
- Hierarchical imitation and reinforcement learning. In ICML, 2018.
- LIBERO: Benchmarking knowledge transfer for lifelong robot learning. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URL https://openreview.net/forum?id=xzEtNSuDJk.
- Action-quantized offline reinforcement learning for robotic skill learning. In CoRL, 2023. URL https://openreview.net/forum?id=n9lew97SAn.
- Language conditioned imitation learning over unstructured data. In RSS, 2021.
- Learning latent plans from play. In CoRL, 2019.
- What matters in learning from offline human demonstrations for robot manipulation. In arXiv preprint arXiv:2108.03298, 2021.
- Data-efficient hierarchical reinforcement learning. In NeurIPS, 2018.
- R3m: A universal visual representation for robot manipulation. In CoRL, 2022. URL https://openreview.net/forum?id=tGbpgz6yOrI.
- OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
- Reinforcement learning with hierarchies of machines. In NIPS, pp. 1043–1049, 1998.
- Learning and generalization of motor skills by learning from demonstration. In ICRA, 2009.
- Film: Visual reasoning with a general conditioning layer. In AAAI, 2018.
- Puterman, M. L. Markov decision processes: Discrete stochastic dynamic programming. John Wiley and Sons, 1994.
- Language models are unsupervised multitask learners, 2019.
- Toward the fundamental limits of imitation learning. 2020.
- Latent plans for task agnostic offline reinforcement learning. 2022.
- A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS, 2011.
- Data-efficient reinforcement learning with self-predictive representations. In ICLR, 2021. URL https://openreview.net/forum?id=XpSAvlvnMa.
- Neural machine translation of rare words with subword units. In ACL, 2016.
- Reinforcement learning: An introduction. The MIT Press, 2nd edition, 2018.
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, 1999.
- Neural discrete representation learning. In NeurIPS, 2017.
- Neural discrete representation learning. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/7a98af17e63a0ac09ce2e96d03992fbc-Paper.pdf.
- Attention is all you need. In NIPS, 2017.
- Is imitation all you need? generalized decision-making with dual-phase training. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 16221–16231, October 2023.
- Bellman-consistent pessimism for offline reinforcement learning. In NeurIPS, 2021.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=_SJ-_yyes8.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning (CoRL), 2019. URL https://arxiv.org/abs/1910.10897.
- Learning invariant representations for reinforcement learning without reconstruction. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=-2FCwDKRREu.
- Learning fine-grained bimanual manipulation with low-cost hardware. In Bekris, K. E., Hauser, K., Herbert, S. L., and Yu, J. (eds.), Robotics: Science and Systems XIX, Daegu, Republic of Korea, July 10-14, 2023, 2023a. doi: 10.15607/RSS.2023.XIX.016. URL https://doi.org/10.15607/RSS.2023.XIX.016.
- Learning fine-grained bimanual manipulation with low-cost hardware. In RSS, 2023b.
- TACO: Temporal latent action-driven contrastive loss for visual reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=ezCsMOy1w9.
- Premier-taco is a few-shot policy learner: Pretraining multitask representation via temporal action-driven contrastive loss, 2024.