Emergent Mind

RT-1: Robotics Transformer for Real-World Control at Scale

(2212.06817)
Published Dec 13, 2022 in cs.RO , cs.AI , cs.CL , cs.CV , and cs.LG

Abstract

By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer1.github.io

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Sign up for a free account or log in to generate a summary of this paper:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube
References
  1. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
  2. Universal Sentence Encoder
  3. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097
  4. A bayesian developmental approach to robotic goal-based imitation learning. PloS one, 10(11):e0141965
  5. Robonet: Large-scale multi-robot learning. In Conference on Robot Learning
  6. Multi-task policy search for robotics. In 2014 IEEE international conference on robotics and automation (ICRA), pp.  3876–3881. IEEE
  7. Learning modular neural network policies for multi-task and multi-robot transfer. In 2017 IEEE international conference on robotics and automation (ICRA), pp.  2169–2176. IEEE
  8. Doubly Robust Policy Evaluation and Learning
  9. Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
  10. Scene memory transformer for embodied agents in long-horizon tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  538–547
  11. Multi-task hierarchical imitation learning for home automation. In 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), pp.  1–8. IEEE
  12. Robot learning in homes: Improving generalization and reducing dataset bias. Advances in neural information processing systems, 31
  13. MetaMorph: Learning Universal Controllers with Transformers
  14. Bootstrapping with models: Confidence intervals for off-policy evaluation. In Thirty-First AAAI Conference on Artificial Intelligence
  15. RetinaGAN: An Object-aware Approach to Sim-to-Real Transfer
  16. Motion reasoning for goal-based imitation learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp.  4878–4884. IEEE
  17. Off-policy evaluation via off-policy classification. Advances in Neural Information Processing Systems, 32
  18. RLBench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026
  19. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, pp.  991–1002. PMLR
  20. Reinforcement learning as one big sequence modeling problem. In ICML 2021 Workshop on Unsupervised Reinforcement Learning
  21. VIMA: General Robot Manipulation with Multimodal Prompts
  22. Sub-goal trees a framework for goal-based reinforcement learning. In International Conference on Machine Learning, pp. 5020–5030. PMLR
  23. Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on Robot Learning, pp.  651–673. PMLR
  24. MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale
  25. MT-opt: Continuous multi-task robotic reinforcement learning at scale. arXiv, 2021b.
  26. Toward understanding natural language directions. In 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp.  259–266. IEEE
  27. Multi-Game Decision Transformers
  28. PI-QT-Opt: Predictive Information Improves Multi-Task Robotic Reinforcement Learning at Scale
  29. Deep learning for detecting robotic grasps. The International Journal of Robotics Research, 34(4-5):705–724
  30. Language Conditioned Imitation Learning over Unstructured Data
  31. Walk the talk: Connecting language, knowledge, and action in route instructions. Def, 2(6):4
  32. Listen, attend, and walk: Neural mapping of navigational instructions to action sequences. In Thirtieth AAAI Conference on Artificial Intelligence
  33. Learning language-conditioned robot behavior from offline data and crowd-sourced annotation. In Conference on Robot Learning, pp.  1303–1315. PMLR
  34. Image transformer. In International conference on machine learning, pp. 4055–4064. PMLR
  35. Episodic transformer for vision-and-language navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  15942–15952
  36. Film: Visual reasoning with a general conditioning layer. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr. 2018. doi: 10.1609/aaai.v32i1.11671. https://ojs.aaai.org/index.php/AAAI/article/view/11671.

  37. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In 2016 IEEE international conference on robotics and automation (ICRA), pp.  3406–3413. IEEE
  38. Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1
  39. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp. 8748–8763. PMLR
  40. Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics
  41. Zero-shot text-to-image generation. In International Conference on Machine Learning, pp. 8821–8831. PMLR
  42. A Generalist Agent
  43. Tokenlearner: Adaptive space-time tokenization for videos. Advances in Neural Information Processing Systems, 34:12786–12797
  44. Robotic grasping of novel objects. Advances in neural information processing systems, 19
  45. Behavior Transformers: Cloning $k$ modes with one stone
  46. Multiple interactions made easy (mime): Large scale demonstrations data for imitation. In Conference on robot learning, pp.  906–915. PMLR
  47. Cliport: What and where pathways for robotic manipulation. In Proceedings of the 5th Conference on Robot Learning (CoRL)
  48. Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
  49. Lancon-learn: Learning with language to enable generalization in multi-task manipulation. IEEE Robotics and Automation Letters, 7(2):1635–1642
  50. Scalable multi-task imitation learning with autonomous improvement. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp.  2167–2173. IEEE
  51. Language-conditioned imitation learning for robot manipulation tasks. Advances in Neural Information Processing Systems, 33:13139–13150
  52. EfficientNet: Rethinking model scaling for convolutional neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 6105–6114. PMLR, 09–15 Jun 2019. https://proceedings.mlr.press/v97/tan19a.html.

  53. Understanding natural language commands for robotic navigation and mobile manipulation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 25, pp.  1507–1514
  54. Attention is all you need. Advances in neural information processing systems, 30
  55. Learning a visuomotor controller for real world robotic grasping using simulated depth images. In Conference on robot learning, pp.  291–300. PMLR
  56. Thinking While Moving: Deep Reinforcement Learning with Concurrent Control
  57. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pp.  1094–1100. PMLR
  58. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp.  5628–5635. IEEE
  59. Hierarchical Task Learning from Language Instructions with Unified Transformers and Self-Monitoring

Show All 59