Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels (2407.15786v1)

Published 22 Jul 2024 in cs.LG and cs.AI

Abstract: Recent advances in reinforcement learning (RL) have predominantly leveraged neural network-based policies for decision-making, yet these models often lack interpretability, posing challenges for stakeholder comprehension and trust. Concept bottleneck models offer an interpretable alternative by integrating human-understandable concepts into neural networks. However, a significant limitation in prior work is the assumption that human annotations for these concepts are readily available during training, necessitating continuous real-time input from human annotators. To overcome this limitation, we introduce a novel training scheme that enables RL algorithms to efficiently learn a concept-based policy by only querying humans to label a small set of data, or in the extreme case, without any human labels. Our algorithm, LICORICE, involves three main contributions: interleaving concept learning and RL training, using a concept ensembles to actively select informative data points for labeling, and decorrelating the concept data with a simple strategy. We show how LICORICE reduces manual labeling efforts to to 500 or fewer concept labels in three environments. Finally, we present an initial study to explore how we can use powerful vision-LLMs to infer concepts from raw visual inputs without explicit labels at minimal cost to performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Hello gpt-4o. https://openai.com/index/hello-gpt-4o/.
  2. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  3. The power of ensembles for active learning in image classification. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9368–9377, 2018. URL https://api.semanticscholar.org/CorpusID:52838058.
  4. Interactive concept bottleneck models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5948–5955, 2023.
  5. Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023.
  6. Human uncertainty in concept-based ai systems. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pages 869–889, 2023.
  7. State2explanation: Concept-based explanations to benefit agent learning and user understanding. Advances in Neural Information Processing Systems, 36, 2023.
  8. Interpretable and explainable logical policies via neurally guided symbolic abstraction. Advances in Neural Information Processing Systems, 36, 2024.
  9. Concept embedding models: Beyond the accuracy-explainability trade-off. Advances in Neural Information Processing Systems, 35:21400–21413, 2022.
  10. A survey on interpretable reinforcement learning. Machine Learning, pages 1–44, 2024.
  11. Concept-based understanding of emergent multi-agent behavior. In Deep Reinforcement Learning Workshop NeurIPS 2022, 2022.
  12. Interpreting interpretability: understanding data scientists’ use of interpretability tools for machine learning. In Proceedings of the 2020 CHI conference on human factors in computing systems, pages 1–14, 2020.
  13. Concept bottleneck models. In International Conference on Machine Learning, pages 5338–5348. PMLR, 2020.
  14. I. Lage and F. Doshi-Velez. Learning interpretable concept-based models with human feedback. International Conference on Machine Learning: Workshop on Human Interpretability in Machine Learning, 2020.
  15. Finrl-meta: Market environments and benchmarks for data-driven financial reinforcement learning. Advances in Neural Information Processing Systems, 35:1835–1849, 2022.
  16. Experiences surveying the crowd: Reflections on methods, participation, and reliability. In Proceedings of the 5th Annual ACM Web Science Conference, pages 234–243, 2013.
  17. A graph placement methodology for fast chip design. Nature, 594(7862):207–212, 2021.
  18. Who should predict? exact algorithms for learning to defer to humans. In AISTATS, 2023.
  19. S. Penkov and S. Ramamoorthy. Learning programmatically structured representations with perceptor gradients. In Proceedings of the International Conference on Learning Representations, 2019.
  20. Concept-based explainable artificial intelligence: A survey. arXiv preprint arXiv:2312.12936, 2023.
  21. M. L. Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  22. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  23. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021.
  24. Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory, pages 287–294, 1992.
  25. I. Sheth and S. Ebrahimi Kahou. Auxiliary losses for learning generalizable concept-based models. Advances in Neural Information Processing Systems, 36, 2023.
  26. Optimization methods for interpretable differentiable decision trees applied to reinforcement learning. In International conference on artificial intelligence and statistics, pages 1855–1865. PMLR, 2020.
  27. Reinforcement learning: An introduction. MIT press, 2018.
  28. A. Tharwat and W. Schenck. A survey on active learning: state-of-the-art, practical challenges and research directions. Mathematics, 11(4):820, 2023.
  29. Iterative bounding mdps: Learning interpretable policies via non-interpretable methods. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9923–9931, 2021.
  30. Programmatically interpretable reinforcement learning. In International Conference on Machine Learning, pages 5045–5054. PMLR, 2018.
  31. Causal inference q-network: Toward resilient reinforcement learning. In Self-Supervision for Reinforcement Learning Workshop-ICLR 2021, 2021.
  32. Reinforcement learning in healthcare: A survey. ACM Computing Surveys (CSUR), 55(1):1–36, 2021.
  33. Concept learning for interpretable multi-agent reinforcement learning. In Conference on Robot Learning, pages 1828–1837. PMLR, 2023.
  34. Learning to receive help: Intervention-aware concept embedding models. Advances in Neural Information Processing Systems, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhuorui Ye (7 papers)
  2. Stephanie Milani (23 papers)
  3. Geoffrey J. Gordon (30 papers)
  4. Fei Fang (103 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.