Enhancing the Rationale-Input Alignment for Self-explaining Rationalization (2312.04103v2)
Abstract: Rationalization empowers deep learning models with self-explaining capabilities through a cooperative game, where a generator selects a semantically consistent subset of the input as a rationale, and a subsequent predictor makes predictions based on the selected rationale. In this paper, we discover that rationalization is prone to a problem named \emph{rationale shift}, which arises from the algorithmic bias of the cooperative game. Rationale shift refers to a situation where the semantics of the selected rationale may deviate from the original input, but the predictor still produces accurate predictions based on the deviation, resulting in a compromised generator with misleading feedback. To address this issue, we first demonstrate the importance of the alignment between the rationale and the full input through both empirical observations and theoretical analysis. Subsequently, we introduce a novel approach called DAR (\textbf{D}iscriminatively \textbf{A}ligned \textbf{R}ationalization), which utilizes an auxiliary module pretrained on the full input to discriminatively align the selected rationale and the original input. We theoretically illustrate how DAR accomplishes the desired alignment, thereby overcoming the rationale shift problem. The experiments on two widely used real-world benchmarks show that the proposed method significantly improves the explanation quality (measured by the overlap between the model-selected explanation and the human-annotated rationale) as compared to state-of-the-art techniques. Additionally, results on two synthetic settings further validate the effectiveness of DAR in addressing the rationale shift problem.
- Z. C. Lipton, “The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery.,” Queue, vol. 16, no. 3, pp. 31–57, 2018.
- L. Xiang, H. Zhang, H. Ma, Y. Zhang, J. Ren, and Q. Zhang, “Interpretable complex-valued neural networks for privacy protection,” in International Conference on Learning Representations, 2019.
- T. Miller, “Explanation in artificial intelligence: Insights from the social sciences,” Artificial intelligence, vol. 267, pp. 1–38, 2019.
- X. Sun, D. Yang, X. Li, T. Zhang, Y. Meng, Q. Han, G. Wang, E. Hovy, and J. Li, “Interpreting deep learning models in natural language processing: A review,” arXiv preprint arXiv:2110.10470, 2021.
- R. Pradhan, J. Zhu, B. Glavic, and B. Salimi, “Interpretable data-based explanations for fairness debugging,” in SIGMOD ’22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, pp. 247–261, ACM, 2022.
- Y. Li, Y. Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- H. Chen, J. He, K. Narasimhan, and D. Chen, “Can rationalization improve robustness?,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pp. 3792–3805, Association for Computational Linguistics, 2022.
- A. Chan, M. Sanjabi, L. Mathias, L. Tan, S. Nie, X. Peng, X. Ren, and H. Firooz, “UNIREX: A unified learning framework for language model rationale extraction,” in International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, vol. 162 of Proceedings of Machine Learning Research, pp. 2867–2889, PMLR, 2022.
- M. Yu, Y. Zhang, S. Chang, and T. S. Jaakkola, “Understanding interlocking dynamics of cooperative rationalization,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 12822–12835, 2021.
- S. Chang, Y. Zhang, M. Yu, and T. S. Jaakkola, “Invariant rationalization,” in Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, vol. 119 of Proceedings of Machine Learning Research, pp. 1448–1458, PMLR, 2020.
- T. Lei, R. Barzilay, and T. S. Jaakkola, “Rationalizing neural predictions,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pp. 107–117, The Association for Computational Linguistics, 2016.
- Y. Huang, Y. Chen, Y. Du, and Z. Yang, “Distribution matching for rationalization,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp. 13090–13097, AAAI Press, 2021.
- S. Chang, Y. Zhang, M. Yu, and T. S. Jaakkola, “A game theoretic approach to class-wise selective rationalization,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 10055–10065, 2019.
- M. Yu, S. Chang, Y. Zhang, and T. S. Jaakkola, “Rethinking cooperative rationalization: Introspective extraction and complement control,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pp. 4092–4101, Association for Computational Linguistics, 2019.
- W. Liu, H. Wang, J. Wang, R. Li, C. Yue, and Y. Zhang, “FR: Folded rationalization with a unified encoder,” in Advances in Neural Information Processing Systems, 2022.
- H. Yuan, L. Cai, X. Hu, J. Wang, and S. Ji, “Interpreting image classifiers by generating discrete masks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 4, pp. 2019–2030, 2022.
- D. Luo, W. Cheng, D. Xu, W. Yu, B. Zong, H. Chen, and X. Zhang, “Parameterized explainer for graph neural network,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- H. Zhang, W. Chen, Z. Huang, M. Li, Y. Yang, W. Zhang, and J. Wang, “Bi-level actor-critic for multi-agent coordination,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pp. 7325–7332, AAAI Press, 2020.
- I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp. 2672–2680, 2014.
- Y. Bao, S. Chang, M. Yu, and R. Barzilay, “Deriving machine attention from human rationales,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pp. 1903–1913, Association for Computational Linguistics, 2018.
- J. Bastings, W. Aziz, and I. Titov, “Interpretable neural predictions with differentiable binary variables,” in Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 2963–2977, Association for Computational Linguistics, 2019.
- B. Paranjape, M. Joshi, J. Thickstun, H. Hajishirzi, and L. Zettlemoyer, “An information bottleneck approach for controlling conciseness in rationale extraction,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pp. 1938–1952, Association for Computational Linguistics, 2020.
- S. Jain, S. Wiegreffe, Y. Pinter, and B. C. Wallace, “Learning to faithfully rationalize by construction,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pp. 4459–4473, Association for Computational Linguistics, 2020.
- M. Plyler, M. Green, and M. Chi, “Making a (counterfactual) difference one rationale at a time,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 28701–28713, 2021.
- L. Yue, Q. Liu, L. Wang, Y. An, Y. Du, and Z. Huang, “Interventional rationalization,” 2023.
- L. Yue, Q. Liu, Y. Du, Y. An, L. Wang, and E. Chen, “DARE: disentanglement-augmented rationale extraction,” in NeurIPS, 2022.
- M. T. Ribeiro, S. Singh, and C. Guestrin, “”why should I trust you?”: Explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pp. 1135–1144, ACM, 2016.
- S. Havrylov, G. Kruszewski, and A. Joulin, “Cooperative learning of disjoint syntax and semantics,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 1118–1128, Association for Computational Linguistics, 2019.
- P. Fernandes, M. Treviso, D. Pruthi, A. Martins, and G. Neubig, “Learning to scaffold: Optimizing model explanations for teaching,” Advances in Neural Information Processing Systems, vol. 35, pp. 36108–36122, 2022.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in NeurIPS, 2022.
- E. Kıcıman, R. Ness, A. Sharma, and C. Tan, “Causal reasoning and large language models: Opening a new frontier for causality,” arXiv preprint arXiv:2305.00050, 2023.
- Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,” ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, 2023.
- C. Qin, A. Zhang, Z. Zhang, J. Chen, M. Yasunaga, and D. Yang, “Is chatgpt a general-purpose natural language processing task solver?,” arXiv preprint arXiv:2302.06476, 2023.
- B. Li, G. Fang, Y. Yang, Q. Wang, W. Ye, W. Zhao, and S. Zhang, “Evaluating chatgpt’s information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness,” arXiv preprint arXiv:2304.11633, 2023.
- J. Ye, X. Chen, N. Xu, C. Zu, Z. Shao, S. Liu, Y. Cui, Z. Zhou, C. Gong, Y. Shen, et al., “A comprehensive capability analysis of gpt-3 and gpt-3.5 series models,” arXiv preprint arXiv:2303.10420, 2023.
- T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, OpenReview.net, 2018.
- K. Cho, B. van Merrienboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1724–1734, ACL, 2014.
- J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1532–1543, ACL, 2014.
- J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186, Association for Computational Linguistics, 2019.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net, 2017.
- W. Zhang, T. Wu, Y. Wang, Y. Cai, and H. Cai, “Towards trustworthy explanation: On causal rationalization,” in Proceedings of the 40th International Conference on Machine Learning (ICML’23), vol. 202 of Proceedings of Machine Learning Research, pp. 41715–41736, PMLR, 23–29 Jul 2023.
- J. J. McAuley, J. Leskovec, and D. Jurafsky, “Learning attitudes and attributes from multi-aspect reviews,” in 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, December 10-13, 2012, pp. 1020–1025, IEEE Computer Society, 2012.
- H. Wang, Y. Lu, and C. Zhai, “Latent aspect rating analysis on review text data: a rating regression approach,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28, 2010, pp. 783–792, ACM, 2010.
- N. M. Guerreiro and A. F. T. Martins, “SPECTRA: sparse structured text rationalization,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pp. 6534–6550, Association for Computational Linguistics, 2021.