Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
117 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic Counterfactual Augmentation for Robust Text Classification Based on Word-Group Search (2307.01214v1)

Published 1 Jul 2023 in cs.CL and cs.AI

Abstract: Despite large-scale pre-trained LLMs have achieved striking results for text classificaion, recent work has raised concerns about the challenge of shortcut learning. In general, a keyword is regarded as a shortcut if it creates a superficial association with the label, resulting in a false prediction. Conversely, shortcut learning can be mitigated if the model relies on robust causal features that help produce sound predictions. To this end, many studies have explored post-hoc interpretable methods to mine shortcuts and causal features for robustness and generalization. However, most existing methods focus only on single word in a sentence and lack consideration of word-group, leading to wrong causal features. To solve this problem, we propose a new Word-Group mining approach, which captures the causal effect of any keyword combination and orders the combinations that most affect the prediction. Our approach bases on effective post-hoc analysis and beam search, which ensures the mining effect and reduces the complexity. Then, we build a counterfactual augmentation method based on the multiple word-groups, and use an adaptive voting mechanism to learn the influence of different augmentated samples on the prediction results, so as to force the model to pay attention to effective causal features. We demonstrate the effectiveness of the proposed method by several tasks on 8 affective review datasets and 4 toxic language datasets, including cross-domain text classificaion, text attack and gender fairness test.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. S. Liu, X. Cheng, F. Li, and F. Li, “Tasc: Topic-adaptive sentiment classification on dynamic tweets,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 6, pp. 1696–1709, 2014.
  2. X. Zhou, X. Wan, and J. Xiao, “Cminer: opinion extraction and summarization for chinese microblogs,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 7, pp. 1650–1663, 2016.
  3. C. Song, C. Yang, H. Chen, C. Tu, Z. Liu, and M. Sun, “Ced: credible early detection of social media rumors,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 8, pp. 3035–3047, 2019.
  4. A. Vaidya, F. Mai, and Y. Ning, “Empirical analysis of multi-task learning for reducing identity bias in toxic comment detection,” in Proceedings of the International AAAI Conference on Web and Social Media, vol. 14, 2020, pp. 683–693.
  5. J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, 2019, pp. 4171–4186.
  6. Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “ALBERT: A lite BERT for self-supervised learning of language representations,” in 8th International Conference on Learning Representations, ICLR 2020, 2020.
  7. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized BERT pretraining approach,” CoRR, vol. abs/1907.11692, 2019.
  8. Z. Wang and A. Culotta, “Identifying spurious correlations for robust text classification,” in Findings of the Association for Computational Linguistics: EMNLP 2020, ser. Findings of ACL, vol. EMNLP 2020, 2020, pp. 3431–3440.
  9. T. Wang, R. Sridhar, D. Yang, and X. Wang, “Identifying and mitigating spurious correlations for improving robustness in NLP models,” in Findings of the Association for Computational Linguistics: NAACL 2022, 2022, pp. 1719–1729.
  10. M. Du, V. Manjunatha, R. Jain, R. Deshpande, F. Dernoncourt, J. Gu, T. Sun, and X. Hu, “Towards interpreting and mitigating shortcut learning behavior of NLU models,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, 2021, pp. 915–929.
  11. R. Zellers, Y. Bisk, R. Schwartz, and Y. Choi, “SWAG: A large-scale adversarial dataset for grounded commonsense inference,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 93–104.
  12. M. Wiegand, J. Ruppenhofer, and E. Eder, “Implicitly abusive language - what does it actually look like and why are we not getting there?” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, 2021, pp. 576–587.
  13. K. Lu, P. Mardziel, F. Wu, P. Amancharla, and A. Datta, “Gender bias in neural natural language processing,” in Logic, Language, and Security - Essays Dedicated to Andre Scedrov on the Occasion of His 65th Birthday, ser. Lecture Notes in Computer Science, V. Nigam, T. B. Kirigin, C. L. Talcott, J. D. Guttman, S. L. Kuznetsov, B. T. Loo, and M. Okada, Eds., vol. 12300, 2020, pp. 189–202.
  14. Z. Wang and A. Culotta, “Robustness to spurious correlations in text classification via automatically generated counterfactuals,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, 2021, pp. 14 024–14 031.
  15. D. Kaushik, A. Setlur, E. H. Hovy, and Z. C. Lipton, “Explaining the efficacy of counterfactually augmented data,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021.
  16. R. K. Yadav, L. Jiao, O. Granmo, and M. Goodwin, “Robust interpretable text classification against spurious correlations using and-rules with negation,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, L. D. Raedt, Ed., 2022, pp. 4439–4446.
  17. S. Choi, M. Jeong, H. Han, and S. Hwang, “C2L: causally contrastive learning for robust text classification,” in Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, 2022, pp. 10 526–10 534.
  18. S. Sikdar, P. Bhattacharya, and K. Heese, “Integrated directional gradients: Feature interaction attribution for neural NLP models,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, 2021, pp. 865–878.
  19. Y. Zhang, I. J. Marshall, and B. C. Wallace, “Rationale-augmented convolutional neural networks for text classification,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, 2016, pp. 795–804.
  20. Z. Wang, K. Shu, and A. Culotta, “Enhancing model robustness and fairness with causality: A regularization approach,” CoRR, vol. abs/2110.00911, 2021.
  21. M. T. Ribeiro, T. Wu, C. Guestrin, and S. Singh, “Beyond accuracy: Behavioral testing of NLP models with checklist (extended abstract),” in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021, Z. Zhou, Ed., 2021, pp. 4824–4828.
  22. J. Lu, L. Yang, B. MacNamee, and Y. Zhang, “A rationale-centric framework for human-in-the-loop machine learning,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, 2022, pp. 6986–6996.
  23. S. J. Moon, S. Mo, K. Lee, J. Lee, and J. Shin, “MASKER: masked keyword regularization for reliable text classification,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, 2021, pp. 13 578–13 586.
  24. D. Teney, E. Abbasnejad, and A. van den Hengel, “Learning what makes a difference from counterfactual examples and gradient supervision,” in Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part X, ser. Lecture Notes in Computer Science, vol. 12355, 2020, pp. 580–599.
  25. J. Robinson, L. Sun, K. Yu, K. Batmanghelich, S. Jegelka, and S. Sra, “Can contrastive learning avoid shortcut solutions?” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 2021, pp. 4974–4986.
  26. S. Garg and G. Ramakrishnan, “BAE: bert-based adversarial examples for text classification,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, 2020, pp. 6174–6181.
  27. S. Li, W. Ma, J. Zhang, C. H. Liu, J. Liang, and G. Wang, “Meta-reweighted regularization for unsupervised domain adaptation,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 3, pp. 2781–2795, 2023.
  28. L. Tu, G. Lalwani, S. Gella, and H. He, “An empirical study on robustness to spurious correlations using pre-trained language models,” Trans. Assoc. Comput. Linguistics, vol. 8, pp. 621–633, 2020.
  29. Y. Chai, R. Liang, S. Samtani, H. Zhu, M. Wang, Y. Liu, and Y. Jiang, “Additive feature attribution explainable methods to craft adversarial attacks for text classification and text regression,” IEEE Transactions on Knowledge and Data Engineering, 2023.
  30. J. Nam, H. Cha, S. Ahn, J. Lee, and J. Shin, “Learning from failure: De-biasing classifier from biased classifier,” Advances in Neural Information Processing Systems, vol. 33, pp. 20 673–20 684, 2020.
  31. V. Sanh, T. Wolf, Y. Belinkov, and A. M. Rush, “Learning from others’ mistakes: Avoiding dataset biases without modeling them,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021.
  32. P. A. Utama, N. S. Moosavi, and I. Gurevych, “Mind the trade-off: Debiasing NLU models without degrading the in-distribution performance,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, 2020, pp. 8717–8729.
  33. S. Garg, V. Perot, N. Limtiaco, A. Taly, E. H. Chi, and A. Beutel, “Counterfactual fairness in text classification through robustness,” in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2019, Honolulu, HI, USA, January 27-28, 2019, 2019, pp. 219–226.
  34. D. Kaushik, E. H. Hovy, and Z. C. Lipton, “Learning the difference that makes A difference with counterfactually-augmented data,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020.
  35. N. Ng, K. Cho, and M. Ghassemi, “SSMBA: self-supervised manifold based data augmentation for improving out-of-domain robustness,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, 2020, pp. 1268–1283.
  36. M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in Proceedings of the 34th International Conference on Machine Learning, ICML 2017, ser. Proceedings of Machine Learning Research, vol. 70, 2017, pp. 3319–3328.
  37. C. Winship and S. L. Morgan, “The estimation of causal effects from observational data,” Annual review of sociology, vol. 25, no. 1, pp. 659–706, 1999.
  38. Y. Guo, Y. Yang, and A. Abbasi, “Auto-debias: Debiasing masked language models with automated biased prompts,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022, pp. 1012–1023.
  39. R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio, “Learning deep representations by mutual information estimation and maximization,” in 7th International Conference on Learning Representations, ICLR 2019, 2019.
  40. M. El Zein, B. Bahrami, and R. Hertwig, “Shared responsibility in collective decisions,” Nature human behaviour, vol. 3, no. 6, pp. 554–559, 2019.
  41. C. Geng and S. Chen, “Collective decision for open set recognition,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 1, pp. 192–204, 2020.
  42. T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, “A simple framework for contrastive learning of visual representations,” in Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, ser. Proceedings of Machine Learning Research, vol. 119, 2020, pp. 1597–1607.
  43. Z. Zhang, Z. Zhao, Z. Lin, J. Zhu, and X. He, “Counterfactual contrastive learning for weakly-supervised vision-language grounding,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 2020.
  44. J. Blitzer, M. Dredze, and F. Pereira, “Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification,” in ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics.   The Association for Computational Linguistics, 2007.
  45. B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales,” in ACL 2005, 43rd Annual Meeting of the Association for Computational Linguistics, 2005, pp. 115–124.
  46. J. J. McAuley and J. Leskovec, “From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews,” in 22nd International World Wide Web Conference, WWW ’13, 2013, pp. 897–908.
  47. R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts, “Recursive deep models for semantic compositionality over a sentiment treebank,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 2013, pp. 1631–1642.
  48. R. He and J. J. McAuley, “Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering,” in Proceedings of the 25th International Conference on World Wide Web, WWW 2016, 2016, pp. 507–517.
  49. T. Davidson, D. Warmsley, M. W. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” in Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017, 2017, pp. 512–515.
  50. M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, and R. Kumar, “Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval),” in Proceedings of the 13th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2019, 2019, pp. 75–86.
  51. S. Ren, Y. Deng, K. He, and W. Che, “Generating natural language adversarial examples through probability weighted word saliency,” in Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, pp. 1085–1097.
  52. J. Li, S. Ji, T. Du, B. Li, and T. Wang, “Textbugger: Generating adversarial text against real-world applications,” in 26th Annual Network and Distributed System Security Symposium, NDSS 2019, 2019.
  53. D. Jin, Z. Jin, J. T. Zhou, and P. Szolovits, “Is BERT really robust? A strong baseline for natural language attack on text classification and entailment,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, 2020, pp. 8018–8025.
  54. H. Yao, Y. Chen, Q. Ye, X. Jin, and X. Ren, “Refining language models with compositional explanations,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, 2021, pp. 8954–8967.
  55. M. Nadeem, A. Bethke, and S. Reddy, “Stereoset: Measuring stereotypical bias in pretrained language models,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, 2021, pp. 5356–5371.
  56. G. Zhang, B. Bai, J. Zhang, K. Bai, C. Zhu, and T. Zhao, “Demographics should not be the reason of toxicity: Mitigating discrimination in text classifications with instance weighting,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, 2020, pp. 4134–4145.
  57. D. Borkan, L. Dixon, J. Sorensen, N. Thain, and L. Vasserman, “Nuanced metrics for measuring unintended bias with real data for text classification,” in Companion of The 2019 World Wide Web Conference, WWW 2019, 2019, pp. 491–500.
  58. J. Wen, Y. Zhu, J. Zhang, J. Zhou, and M. Huang, “Autocad: Automatically generate counterfactuals for mitigating shortcut learning,” in Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, 2022, pp. 2302–2317.
  59. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 2020.
  60. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” CoRR, vol. abs/2302.13971, 2023.

Summary

We haven't generated a summary for this paper yet.