Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs (2401.16638v1)

Published 30 Jan 2024 in cs.CL and cs.AI

Abstract: Fine-tuning large pre-trained LLMs on particular datasets is a commonly employed strategy in NLP classification tasks. However, this approach usually results in a loss of models generalizability. In this paper, we present a framework that allows for maintaining generalizability, and enhances the performance on the downstream task by utilizing task-specific context attribution. We show that a linear transformation of the text representation from any transformer model using the task-specific concept operator results in a projection onto the latent concept space, referred to as context attribution in this paper. The specific concept operator is optimized during the supervised learning stage via novel loss functions. The proposed framework demonstrates that context attribution of the text representation for each task objective can improve the capacity of the discriminator function and thus achieve better performance for the classification task. Experimental results on three datasets, namely HateXplain, IMDB reviews, and Social Media Attributions, illustrate that the proposed model attains superior accuracy and generalizability. Specifically, for the non-fine-tuned BERT on the HateXplain dataset, we observe 8% improvement in accuracy and 10% improvement in F1-score. Whereas for the IMDB dataset, fine-tuned state-of-the-art XLNet is outperformed by 1% for both accuracy and F1-score. Furthermore, in an out-of-domain cross-dataset test, DistilBERT fine-tuned on the IMDB dataset in conjunction with the proposed model improves the F1-score on the HateXplain dataset by 7%. For the Social Media Attributions dataset of YouTube comments, we observe 5.2% increase in F1-metric. The proposed framework is implemented with PyTorch and provided open-source on GitHub.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Bem, D. J. 1972. Self-perception theory11development of self-perception theory was supported primarily by a grant from the national science foundation (gs 1452) awarded to the author during his tenure at carnegie-mellon university. volume 6 of Advances in Experimental Social Psychology. Academic Press. 1–62.
  2. 2021. Transforming fake news: Robust generalisable news classification using transformers. In 2021 IEEE International Conference on Big Data (Big Data), 3960–3968.
  3. 2019. Interacting Conceptual Spaces I: Grammatical Composition of Concepts. Cham: Springer International Publishing. 151–181.
  4. 2011. Concept labeling: Building text classifiers with minimal supervision. 1225–1230.
  5. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J.; Doran, C.; and Solorio, T., eds., Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. Minneapolis, Minnesota: Association for Computational Linguistics.
  6. 2016. Contextual lstm (clstm) models for large scale nlp tasks. ArXiv abs/1602.06291.
  7. 2023. Methods for measuring, updating, and visualizing factual beliefs in language models. In Vlachos, A., and Augenstein, I., eds., Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2714–2731. Dubrovnik, Croatia: Association for Computational Linguistics.
  8. 1997. Long short-term memory. Neural Computation 9(8):1735–1780.
  9. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 142–150. Portland, Oregon, USA: Association for Computational Linguistics.
  10. 2021. A comparative study of deep neural network models on multi-label text classification in finance. In 2021 IEEE 15th International Conference on Semantic Computing (ICSC), 183–190.
  11. 2021. Hatexplain: A benchmark dataset for explainable hate speech detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 14867–14875.
  12. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781.
  13. 2014. GloVe: Global vectors for word representation. In Moschitti, A.; Pang, B.; and Daelemans, W., eds., Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. Doha, Qatar: Association for Computational Linguistics.
  14. 2019. To tune or not to tune? adapting pretrained representations to diverse tasks. In Augenstein, I.; Gella, S.; Ruder, S.; Kann, K.; Can, B.; Welbl, J.; Conneau, A.; Ren, X.; and Rei, M., eds., Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), 7–14. Florence, Italy: Association for Computational Linguistics.
  15. 1986. Learning Internal Representations by Error Propagation. Cambridge, MA, USA: MIT Press. 318–362.
  16. 2019. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv abs/1910.01108.
  17. 2020. Social media attributions in the context of water crisis. In Webber, B.; Cohn, T.; He, Y.; and Liu, Y., eds., Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1402–1412. Online: Association for Computational Linguistics.
  18. 1985. A general attribution theory for the psychology of religion. Journal for the Scientific Study of Religion 24(1):1–20.
  19. 2019. Studying generalisability across abusive language detection datasets. In Bansal, M., and Villavicencio, A., eds., Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), 940–950. Hong Kong, China: Association for Computational Linguistics.
  20. 2017. Attention is all you need. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  21. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Wallach, H.; Larochelle, H.; Beygelzimer, A.; d'Alché-Buc, F.; Fox, E.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
  22. 2005. Semi-supervised regression with co-training. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, IJCAI’05, 908–913. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets