Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Decoupled Rationalization with Asymmetric Learning Rates: A Flexible Lipschitz Restraint (2305.13599v3)

Published 23 May 2023 in cs.LG and cs.CL

Abstract: A self-explaining rationalization model is generally constructed by a cooperative game where a generator selects the most human-intelligible pieces from the input text as rationales, followed by a predictor that makes predictions based on the selected rationales. However, such a cooperative game may incur the degeneration problem where the predictor overfits to the uninformative pieces generated by a not yet well-trained generator and in turn, leads the generator to converge to a sub-optimal model that tends to select senseless pieces. In this paper, we theoretically bridge degeneration with the predictor's Lipschitz continuity. Then, we empirically propose a simple but effective method named DR, which can naturally and flexibly restrain the Lipschitz constant of the predictor, to address the problem of degeneration. The main idea of DR is to decouple the generator and predictor to allocate them with asymmetric learning rates. A series of experiments conducted on two widely used benchmarks have verified the effectiveness of the proposed method. Codes: \href{https://github.com/jugechengzi/Rationalization-DR}{https://github.com/jugechengzi/Rationalization-DR}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 214–223. https://proceedings.mlr.press/v70/arjovsky17a.html
  2. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Research, Vol. 70). PMLR, 214–223. http://proceedings.mlr.press/v70/arjovsky17a.html
  3. Deriving Machine Attention from Human Rationales. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. Association for Computational Linguistics, 1903–1913. https://doi.org/10.18653/v1/d18-1216
  4. Interpretable Neural Predictions with Differentiable Binary Variables. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics, 2963–2977. https://doi.org/10.18653/v1/p19-1284
  5. UNIREX: A Unified Learning Framework for Language Model Rationale Extraction. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research, Vol. 162). PMLR, 2867–2889. https://proceedings.mlr.press/v162/chan22a.html
  6. A Game Theoretic Approach to Class-wise Selective Rationalization. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 10055–10065. https://proceedings.neurips.cc/paper/2019/hash/5ad742cd15633b26fdce1b80f7b39f7c-Abstract.html
  7. Invariant Rationalization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 1448–1458. http://proceedings.mlr.press/v119/chang20c.html
  8. Can Rationalization Improve Robustness?. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022. Association for Computational Linguistics, 3792–3805. https://doi.org/10.18653/v1/2022.naacl-main.278
  9. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, 1724–1734. https://doi.org/10.3115/v1/d14-1179
  10. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=r1xMH1BtvB
  11. Multi-Aspect Interest Neighbor-Augmented Network for Next-Basket Recommendation. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.
  12. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/n19-1423
  13. Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 11423–11434. https://proceedings.neurips.cc/paper/2019/hash/95e1533eb1b20a97777749fb94fdb944-Abstract.html
  14. Learning to Scaffold: Optimizing Model Explanations for Teaching. CoRR abs/2204.10810 (2022). https://doi.org/10.48550/arXiv.2204.10810 arXiv:2204.10810
  15. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. 2672–2680. https://proceedings.neurips.cc/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html
  16. Improved Training of Wasserstein GANs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5767–5777. https://proceedings.neurips.cc/paper/2017/hash/892c3b1c6dccd52936e27cbd0ff683d6-Abstract.html
  17. Cooperative Learning of Disjoint Syntax and Semantics. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 1118–1128. https://doi.org/10.18653/v1/n19-1115
  18. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 6626–6637. https://proceedings.neurips.cc/paper/2017/hash/8a1d694707eb0fefe65871369074926d-Abstract.html
  19. Distribution Matching for Rationalization. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 13090–13097. https://ojs.aaai.org/index.php/AAAI/article/view/17547
  20. Learning to Faithfully Rationalize by Construction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, 4459–4473. https://doi.org/10.18653/v1/2020.acl-main.409
  21. Categorical Reparameterization with Gumbel-Softmax. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=rkE3y85ee
  22. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. http://arxiv.org/abs/1412.6980
  23. Rationalizing Neural Predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. The Association for Computational Linguistics, 107–117. https://doi.org/10.18653/v1/d16-1011
  24. MGR: Multi-generator based Rationalization. In Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada.
  25. FR: Folded Rationalization with a Unified Encoder. In Advances in Neural Information Processing Systems, NeurIPS 2022, Vol. 35. Curran Associates, Inc., 6954–6966. https://proceedings.neurips.cc/paper_files/paper/2022/file/2e0bd92a1d3600d4288df51ac5e6be5f-Paper-Conference.pdf
  26. Learning Attitudes and Attributes from Multi-aspect Reviews. In 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, December 10-13, 2012. IEEE Computer Society, 1020–1025. https://doi.org/10.1109/ICDM.2012.110
  27. Spectral Normalization for Generative Adversarial Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=B1QRgziT-
  28. An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. Association for Computational Linguistics, 1938–1952. https://doi.org/10.18653/v1/2020.emnlp-main.153
  29. Remigijus Paulavičius and Julius Žilinskas. 2006. Analysis of different norms and corresponding Lipschitz constants for global optimization. Technological and Economic Development of Economy 12, 4 (2006), 301–306. http://elibrary.lt/resursai/Ziniasklaida/Aukstosios/UKIO%20TECHNOLOGINIS%20IR%20EKONOMINIS%20VYSTYMAS/2004/2006/4/8.pdf
  30. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, 1532–1543. https://doi.org/10.3115/v1/d14-1162
  31. Making a (Counterfactual) Difference One Rationale at a Time. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 28701–28713. https://proceedings.neurips.cc/paper/2021/hash/f0f800c92d191d736c4411f3b3f8ef4a-Abstract.html
  32. Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2015, Istanbul, Turkey, May 4-8, 2015. ACM, 1371–1379. http://dl.acm.org/citation.cfm?id=2773328
  33. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1312.6199
  34. Aladin Virmaux and Kevin Scaman. 2018. Lipschitz regularity of deep neural networks: analysis and efficient estimation. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada. 3839–3848. https://proceedings.neurips.cc/paper/2018/hash/d54e99a6c03704e95e6965532dec148b-Abstract.html
  35. Latent aspect rating analysis on review text data: a rating regression approach. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28, 2010. ACM, 783–792. https://doi.org/10.1145/1835804.1835903
  36. Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=BkUHlMZ0b
  37. Gradient Normalization for Generative Adversarial Networks. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 6353–6362. https://doi.org/10.1109/ICCV48922.2021.00631
  38. Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. Association for Computational Linguistics, 4092–4101. https://doi.org/10.18653/v1/D19-1420
  39. Understanding Interlocking Dynamics of Cooperative Rationalization. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 12822–12835. https://proceedings.neurips.cc/paper/2021/hash/6a711a119a8a7a9f877b5f379bfe9ea2-Abstract.html
  40. Interpreting Image Classifiers by Generating Discrete Masks. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 4 (2022), 2019–2030. https://doi.org/10.1109/TPAMI.2020.3028783
  41. Rethinking Lipschitz Neural Networks and Certified Robustness: A Boolean Function Perspective. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openreview.net/forum?id=xaWO6bAY0xM
  42. Bi-Level Actor-Critic for Multi-Agent Coordination. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 7325–7332. https://ojs.aaai.org/index.php/AAAI/article/view/6226
  43. The Irrationality of Neural Rationale Models. In Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022). Association for Computational Linguistics, Seattle, U.S.A., 64–73. https://doi.org/10.18653/v1/2022.trustnlp-1.6
  44. Lipschitz Generative Adversarial Nets. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 7584–7593. https://proceedings.mlr.press/v97/zhou19c.html
Citations (9)

Summary

We haven't generated a summary for this paper yet.