Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control (2402.10450v3)

Published 16 Feb 2024 in cs.LG

Abstract: Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making. In this work, we propose a novel view that treats inducing temporal action abstractions as a sequence compression problem. To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains. We introduce an approach called Primitive Sequence Encoding (PRISE) that combines continuous action quantization with BPE to learn powerful action abstractions. We empirically show that high-level skills discovered by PRISE from a multitask set of robotic manipulation demonstrations significantly boost the performance of both multitask imitation learning as well as few-shot imitation learning on unseen tasks. Our code is released at https://github.com/FrankZheng2022/PRISE.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. OPAL: Offline primitive discovery for accelerating offline reinforcement learning. In ICLR, 2021. URL https://openreview.net/forum?id=V69LGwJ0lIN.
  2. Why exposure bias matters: An imitation learning perspective of error accumulation in language generation. In Findings of ACL, 2022.
  3. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems: Theory and Applications, 13:41–77, 2003.
  4. Language models are few-shot learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  5. Learning action representations for reinforcement learning. In International conference on machine learning, pp.  941–950. PMLR, 2019.
  6. Decision transformer: Reinforcement learning via sequence modeling. In NeurIPS, 2021.
  7. Open X-Embodiment: Robotic learning datasets and RT-X models. https://arxiv.org/abs/2310.08864, 2023.
  8. Representation learning with contrastive predictive coding, 2019.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
  11. Gage, P. A new algorithm for data compression. C Users Journal, 12(2):23––38, 1994.
  12. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In CoRL, 2019. URL https://proceedings.mlr.press/v100/gupta20a.html.
  13. Learning an embedding space for transferable robot skills. In ICLR, 2018.
  14. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778, 2016. doi: 10.1109/CVPR.2016.90.
  15. The dependence of effective planning horizon on model accuracy. In AAMAS, 2015.
  16. Efficient planning in a compact latent action space. In ICLR, 2023. URL https://openreview.net/forum?id=cA77NrVEuqn.
  17. Auto-encoding variational Bayes. In NIPS, 2014.
  18. Compile: Compositional imitation learning and execution. In ICML, 2019.
  19. DDCO: Discovery of deep continuous options for robot learning from demonstrations. In CoRL, 2017.
  20. Kudo, T. Subword regularization: Improving neural network translation models with multiple subword candidates. arXiv preprint arXiv:1804.10959, 2018.
  21. Hierarchical imitation learning with vector quantized models. In ICML, 2023. URL https://proceedings.mlr.press/v202/kujanpaa23a.html.
  22. Hierarchical imitation and reinforcement learning. In ICML, 2018.
  23. LIBERO: Benchmarking knowledge transfer for lifelong robot learning. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URL https://openreview.net/forum?id=xzEtNSuDJk.
  24. Action-quantized offline reinforcement learning for robotic skill learning. In CoRL, 2023. URL https://openreview.net/forum?id=n9lew97SAn.
  25. Language conditioned imitation learning over unstructured data. In RSS, 2021.
  26. Learning latent plans from play. In CoRL, 2019.
  27. What matters in learning from offline human demonstrations for robot manipulation. In arXiv preprint arXiv:2108.03298, 2021.
  28. Data-efficient hierarchical reinforcement learning. In NeurIPS, 2018.
  29. R3m: A universal visual representation for robot manipulation. In CoRL, 2022. URL https://openreview.net/forum?id=tGbpgz6yOrI.
  30. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  31. Reinforcement learning with hierarchies of machines. In NIPS, pp.  1043–1049, 1998.
  32. Learning and generalization of motor skills by learning from demonstration. In ICRA, 2009.
  33. Film: Visual reasoning with a general conditioning layer. In AAAI, 2018.
  34. Puterman, M. L. Markov decision processes: Discrete stochastic dynamic programming. John Wiley and Sons, 1994.
  35. Language models are unsupervised multitask learners, 2019.
  36. Toward the fundamental limits of imitation learning. 2020.
  37. Latent plans for task agnostic offline reinforcement learning. 2022.
  38. A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS, 2011.
  39. Data-efficient reinforcement learning with self-predictive representations. In ICLR, 2021. URL https://openreview.net/forum?id=XpSAvlvnMa.
  40. Neural machine translation of rare words with subword units. In ACL, 2016.
  41. Reinforcement learning: An introduction. The MIT Press, 2nd edition, 2018.
  42. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, 1999.
  43. Neural discrete representation learning. In NeurIPS, 2017.
  44. Neural discrete representation learning. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/7a98af17e63a0ac09ce2e96d03992fbc-Paper.pdf.
  45. Attention is all you need. In NIPS, 2017.
  46. Is imitation all you need? generalized decision-making with dual-phase training. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  16221–16231, October 2023.
  47. Bellman-consistent pessimism for offline reinforcement learning. In NeurIPS, 2021.
  48. Mastering visual continuous control: Improved data-augmented reinforcement learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=_SJ-_yyes8.
  49. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning (CoRL), 2019. URL https://arxiv.org/abs/1910.10897.
  50. Learning invariant representations for reinforcement learning without reconstruction. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=-2FCwDKRREu.
  51. Learning fine-grained bimanual manipulation with low-cost hardware. In Bekris, K. E., Hauser, K., Herbert, S. L., and Yu, J. (eds.), Robotics: Science and Systems XIX, Daegu, Republic of Korea, July 10-14, 2023, 2023a. doi: 10.15607/RSS.2023.XIX.016. URL https://doi.org/10.15607/RSS.2023.XIX.016.
  52. Learning fine-grained bimanual manipulation with low-cost hardware. In RSS, 2023b.
  53. TACO: Temporal latent action-driven contrastive loss for visual reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=ezCsMOy1w9.
  54. Premier-taco is a few-shot policy learner: Pretraining multitask representation via temporal action-driven contrastive loss, 2024.
Citations (3)

Summary

  • The paper introduces PRISE, which reformulates temporal action abstraction as a sequence compression task using NLP-inspired techniques.
  • It employs continuous action quantization with Byte Pair Encoding to enhance multitask and few-shot imitation learning performance.
  • Empirical evaluations on Metaworld and LIBERO benchmarks demonstrate PRISE’s superior efficiency in robotic manipulation tasks.

PRISE: Enhancing Decision Making in Robotics with Temporal Action Abstractions

Introduction to Primitive Sequence Encoding (PRISE)

Temporal action abstractions represent a pivotal area of interest in the domain of robotics, particularly within the context of sequential decision-making. The paper introduces Primitive Sequence Encoding (PRISE), a novel approach that reimagines the process of deriving temporal action abstractions as akin to a sequence compression challenge, traditionally seen in linguistic tokenization processes, specifically Byte Pair Encoding (BPE). By employing continuous action quantization coupled with BPE, PRISE facilitates the extraction of versatile, high-level skills from robotic manipulation demonstrations. These skills markedly improve the efficiency of multitask imitation learning (IL) as well as few-shot IL for novel tasks.

Theoretical Foundations and Proposed Methodology

The underlying principle of PRISE is grounded in the adaptation of discrete coding and sequence compression strategies from NLP to the continuous control sphere, thereby forging a novel pathway in action representation learning. This adaptation involves the conversion of high-dimensional, continuous action spaces into discrete codes, which are subsequently processed through BPE to yield temporally abstracted skills. The paper delineates the algorithmic architecture of PRISE, elaborating on its two-stage pretraining phase—the initial action quantization followed by temporal action abstraction via BPE. Through extensive ablation studies, the paper underscores the indispensability of BPE in PRISE's architecture for achieving significant performance benchmarks.

Empirical Evaluation and Results

PRISE was rigorously evaluated on multi-task robotic manipulation benchmarks, including Metaworld and LIBERO. The performance metrics prominently highlighted PRISE’s superiority over contemporary approaches, particularly in multitask imitation learning and few-shot adaptation to unseen tasks. Notably, the paper presents a quantifiable analysis demonstrating PRISE's significant edge in learning efficiency and adaptation capability, attributing these achievements to the method's innovative embrace of NLP techniques for action abstraction.

Implications and Forward-Thinking Aspects

The research not only presents PRISE as a compelling methodological advancement in learning temporal action abstractions but also sets a precedent for integrating discrete coding and sequence compression techniques from NLP into continuous control. The implications of this research extend beyond immediate performance enhancements, signaling a shift towards more linguistically inspired methodologies in robotics. Looking ahead, the paper speculates on scaling PRISE to handle larger, more diverse datasets with varying robotic embodiments, and explores the potential of coupling PRISE’s pretrained tokens with LLMs to achieve broader task generalization and adaptability.

Conclusion

In sum, PRISE marks a significant stride forward in the evolution of sequential decision making in robotics, distinguished by its novel application of sequence compression methodologies from NLP to learning temporal action abstractions. The paper not only validates PRISE's conceptual viability through empirical evidence but also envisions a future wherein linguistic methodologies permeate more deeply into the fabric of robotic decision making, promising richer adaptability and efficiency in automated processes.