Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Sentiment Analysis: A Resource-Aware Evaluation of Feature Extraction Techniques, Ensembling, and Deep Learning Models (2308.02022v2)

Published 3 Aug 2023 in cs.CL

Abstract: While reaching for NLP systems that maximize accuracy, other important metrics of system performance are often overlooked. Prior models are easily forgotten despite their possible suitability in settings where large computing resources are unavailable or relatively more costly. In this paper, we perform a broad comparative evaluation of document-level sentiment analysis models with a focus on resource costs that are important for the feasibility of model deployment and general climate consciousness. Our experiments consider different feature extraction techniques, the effect of ensembling, task-specific deep learning modeling, and domain-independent LLMs. We find that while a fine-tuned LLM achieves the best accuracy, some alternate configurations provide huge (up to 24, 283 *) resource savings for a marginal (<1%) loss in accuracy. Furthermore, we find that for smaller datasets, the differences in accuracy shrink while the difference in resource consumption grows further.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Aniruddha Adhikary. 2019. Restaurant reviews in dhaka, bangladesh. https://www.kaggle.com/tuxboy/restaurant-reviews-in-dhaka-bangladesh. Accessed: January 12, 2023.
  2. Sentiment analysis of twitter data. In Proceedings of the Workshop on Languages in Social Media, LSM ’11, page 30–38, USA. Association for Computational Linguistics.
  3. Evaluation of sentiment analysis via word embedding and rnn variants for amazon online reviews. Mathematical Problems in Engineering, 2021:1–10.
  4. Evaluating the carbon footprint of NLP methods: a survey and analysis of existing tools. In Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, pages 11–21, Virtual. Association for Computational Linguistics.
  5. Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606.
  6. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  7. Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, New York, NY, USA. Association for Computing Machinery.
  8. Datafiniti. 2018. Grammar and online product reviews. https://data.world/datafiniti/grammar-and-online-product-reviews. Accessed: January 10, 2023.
  9. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  10. Dionysis Goularas and Sani Kamis. 2019. Evaluation of deep learning techniques in sentiment analysis from twitter data. In 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), pages 12–17.
  11. Support vector machines. IEEE Intelligent Systems and their Applications, 13(4):18–28.
  12. Towards climate awareness in NLP research. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2480–2494, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  13. Ensembling graph predictions for amr parsing. In Advances in Neural Information Processing Systems, volume 34, pages 8495–8505. Curran Associates, Inc.
  14. Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 427–431, Valencia, Spain. Association for Computational Linguistics.
  15. Quantifying the carbon emissions of machine learning. Workshop on Tackling Climate Change with Machine Learning at NeurIPS 2019.
  16. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
  17. Energy usage reports: Environmental awareness as part of algorithmic accountability. Workshop on Tackling Climate Change with Machine Learning at NeurIPS 2019.
  18. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA. Association for Computational Linguistics.
  19. Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings.
  20. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
  21. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
  22. Raj Pranesh and Ambesh Shekhar. 2020. Analysis of resource-efficient predictive models for natural language processing. In Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, pages 136–140, Online. Association for Computational Linguistics.
  23. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, Melbourne, Australia. Association for Computational Linguistics.
  24. Radim Řehůřek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta. ELRA.
  25. Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523.
  26. Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3645–3650, Florence, Italy. Association for Computational Linguistics.
  27. Lamda: Language models for dialog applications.
  28. Data Mining: Practical Machine Learning Tools and Techniques, 3rd edition. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
  29. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  30. Character-level convolutional networks for text classification. CoRR, abs/1509.01626.
  31. A text sentiment classification model using double word embedding methods. Multimedia Tools and Applications, pages 1–20.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Mahammed Kamruzzaman (10 papers)
  2. Gene Louis Kim (13 papers)