Memory-Consistent Neural Networks for Imitation Learning (2310.06171v2)
Abstract: Imitation learning considerably simplifies policy synthesis compared to alternative approaches by exploiting access to expert demonstrations. For such imitation policies, errors away from the training samples are particularly critical. Even rare slip-ups in the policy action outputs can compound quickly over time, since they lead to unfamiliar future states where the policy is still more likely to err, eventually causing task failures. We revisit simple supervised behavior cloning'' for conveniently training the policy from nothing more than pre-recorded demonstrations, but carefully design the model class to counter the compounding error phenomenon. Our
memory-consistent neural network'' (MCNN) outputs are hard-constrained to stay within clearly specified permissible regions anchored to prototypical ``memory'' training samples. We provide a guaranteed upper bound for the sub-optimality gap induced by MCNN policies. Using MCNNs on 10 imitation learning tasks, with MLP, Transformer, and Diffusion backbones, spanning dexterous robotic manipulation and driving, proprioceptive inputs and visual inputs, and varying sizes and types of demonstration data, we find large and consistent gains in performance, validating that MCNNs are better-suited than vanilla deep neural networks for imitation learning applications. Website: https://sites.google.com/view/mcnn-imitation
- Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
- Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9329–9338, 2019.
- Carla: An open urban driving simulator. In Conference on robot learning, pp. 1–16. PMLR, 2017.
- Memory classifiers: Two-stage classification for robustness in machine learning. arXiv preprint arXiv:2206.05323, 2022.
- Exploring with sticky mittens: Reinforcement learning with expert interventions via option templates. In Conference on Robot Learning, pp. 1499–1509. PMLR, 2023.
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12873–12883, 2021.
- Implicit behavioral cloning. In Conference on Robot Learning, pp. 158–168. PMLR, 2022.
- Bernd Fritzke. A growing neural gas network learns topologies. In Proceedings of the 7th International Conference on Neural Information Processing Systems, NIPS’94, pp. 625–632, Cambridge, MA, USA, 1994. MIT Press.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
- Bisimulation makes analogies in goal-conditioned reinforcement learning. In International Conference on Machine Learning, pp. 8407–8426. PMLR, 2022.
- Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
- Memory classifiers for robust ecg classification against physiological noise. Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2023.
- Incremental anomaly detection with guarantee in the internet of medical things. In Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation, IoTDI ’23, pp. 327–339, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400700378. doi: 10.1145/3576842.3582374. URL https://doi.org/10.1145/3576842.3582374.
- idecode: In-distribution equivariance for conformal out-of-distribution detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 7104–7114, 2022a.
- Codit: Conformal out-of-distribution detection in time-series data. arXiv preprint arXiv:2207.11769, 2022b.
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
- Dart: Noise injection for robust imitation learning. In Conference on robot learning, pp. 143–156. PMLR, 2017.
- Robust reinforcement learning with relevance vector machines. Robot Learning and Planning (RLP 2016), pp. 5, 2016.
- ’neural-gas’ network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks, 4(4):558–569, 1993. doi: 10.1109/72.238311.
- Transformers are sample efficient world models. arXiv preprint arXiv:2209.00588, 2022.
- When do transformers shine in rl? decoupling memory from credit assignment. arXiv preprint arXiv:2307.03864, 2023.
- Model-based reinforcement learning and the eluder dimension. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS’14, pp. 1466–1474, Cambridge, MA, USA, 2014. MIT Press.
- The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021.
- Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177, 2019.
- Dean A Pomerleau. Efficient training of artificial neural networks for autonomous navigation. Neural computation, 3(1):88–97, 1991.
- Y. Prudent and A. Ennaji. An incremental growing neural gas learns topologies. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, pp. 1211–1216 vol. 2, 2005. doi: 10.1109/IJCNN.2005.1556026.
- Toward the fundamental limits of imitation learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
- On the value of interaction and function approximation in imitation learning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 1325–1336. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/09dbc1177211571ef3e1ca961cc39363-Paper.pdf.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017a.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017b.
- Towards generalization and simplicity in continuous control. Advances in Neural Information Processing Systems, 30, 2017c.
- Efficient reductions for imitation learning. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 661–668. JMLR Workshop and Conference Proceedings, 2010.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 627–635. JMLR Workshop and Conference Proceedings, 2011.
- Behavior transformers: Cloning k𝑘kitalic_k modes with one stone. Advances in neural information processing systems, 35:22955–22968, 2022.
- Q-learning with nearest neighbors. Advances in Neural Information Processing Systems, 31, 2018.
- Guaranteed conformance of neurosymbolic models to natural constraints. arXiv preprint arXiv:2212.01346, 2022a.
- Improving neural network robustness via persistency of excitation. In 2022 American Control Conference (ACC), pp. 1521–1526. IEEE, 2022b.
- Yihao Sun. Offlinerl-kit: An elegant pytorch offline reinforcement learning library. https://github.com/yihaosun1124/OfflineRL-Kit, 2023.
- Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
- Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv preprint arXiv:2208.06193, 2022.
- Interpretable detection of distribution shifts in learning enabled cyber-physical systems. In 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS), pp. 225–235. IEEE, 2022.
- Incremental learning with memory regressors for motion prediction in autonomous racing. In Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023), pp. 264–265, 2023.
- Learning invariant representations for reinforcement learning without reconstruction. arXiv preprint arXiv:2006.10742, 2020.
- Learning fine-grained bimanual manipulation with low-cost hardware. arXiv preprint arXiv:2304.13705, 2023.