Self-Supervised Interpretable End-to-End Learning via Latent Functional Modularity (2403.18947v2)
Abstract: We introduce MoNet, a novel functionally modular network for self-supervised and interpretable end-to-end learning. By leveraging its functional modularity with a latent-guided contrastive loss function, MoNet efficiently learns task-specific decision-making processes in latent space without requiring task-level supervision. Moreover, our method incorporates an online, post-hoc explainability approach that enhances the interpretability of end-to-end inferences without compromising sensorimotor control performance. In real-world indoor environments, MoNet demonstrates effective visual autonomous navigation, outperforming baseline models by 7% to 28% in task specificity analysis. We further explore the interpretability of our network through post-hoc analysis of perceptual saliency maps and latent decision vectors. This provides valuable insights into the incorporation of explainable artificial intelligence into robotic learning, encompassing both perceptual and behavioral perspectives. Supplementary materials are available at https://sites.google.com/view/monet-lgc.
- Variational end-to-end navigation and localization. In 2019 International Conference on Robotics and Automation (ICRA), 8958–8964. IEEE.
- Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6077–6086.
- Mechanisms of top-down attention. Trends in neurosciences, 34(4): 210–224.
- End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316.
- Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 23(6): 5068–5078.
- Attention-based hierarchical deep reinforcement learning for lane change behaviors in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 0–0.
- End-to-end driving via conditional imitation learning. In 2018 IEEE international conference on robotics and automation (ICRA), 4693–4700. IEEE.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
- Intention-net: Integrating planning and deep learning for goal-directed autonomous navigation. In Conference on robot learning, 185–194. PMLR.
- Towards automatic concept-based explanations. Advances in neural information processing systems, 32.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
- Real-time loop closure in 2D LIDAR SLAM. In 2016 IEEE international conference on robotics and automation (ICRA), 1271–1278. IEEE.
- Multi-modal sensor fusion-based deep neural network for end-to-end autonomous driving with scene understanding. IEEE Sensors Journal, 21(10): 11781–11790.
- Bottom-up and top-down attention: different processes and overlapping neural systems. The Neuroscientist, 20(5): 509–521.
- Interpretable learning for self-driving cars by visualizing causal attention. In Proceedings of the IEEE international conference on computer vision, 2942–2950.
- Discovering knowledge in data: an introduction to data mining, volume 4. John Wiley & Sons.
- Modular and hierarchically modular organization of brain networks. Frontiers in neuroscience, 4: 200.
- Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning, 625–632.
- Hierarchical clustering. Introduction to HPC with MPI for Data Science, 195–211.
- F1tenth: An open-source evaluation environment for continuous control and reinforcement learning. Proceedings of Machine Learning Research, 123.
- Pomerleau, D. A. 1988. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1.
- A guide to representational similarity analysis for social neuroscience. Social Cognitive and Affective Neuroscience, 14(11): 1243–1253.
- Planning and decision-making for autonomous vehicles. Annual Review of Control, Robotics, and Autonomous Systems, 1: 187–210.
- Learning to drive at unsignalized intersections using attention-based deep reinforcement learning. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 559–566. IEEE.
- Self-supervised discovering of interpretable features for reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5): 2712–2724.
- Suthaharan, S. 2016. Machine learning models and algorithms for big data classification. Integr. Ser. Inf. Syst, 36: 1–12.
- A survey of end-to-end driving: Architectures and training methods. IEEE Transactions on Neural Networks and Learning Systems, 33(4): 1364–1384.
- Tang, Y. 2013. Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239.
- Hierarchical interpretable imitation learning for end-to-end autonomous driving. IEEE Transactions on Intelligent Vehicles, 8(1): 673–683.
- Visualizing data using t-SNE. Journal of machine learning research, 9(11).
- Attention is all you need. Advances in neural information processing systems, 30.
- End-to-end interpretable neural motion planner. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8660–8669.
- Coaching a Teachable Student. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7805–7815.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.