Papers
Topics
Authors
Recent
Search
2000 character limit reached

Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques

Published 20 May 2024 in cs.CL and cs.LG | (2405.11775v1)

Abstract: Ordinal Classification (OC) is a widely encountered challenge in NLP, with applications in various domains such as sentiment analysis, rating prediction, and more. Previous approaches to tackle OC have primarily focused on modifying existing or creating novel loss functions that \textbf{explicitly} account for the ordinal nature of labels. However, with the advent of Pretrained LLMs (PLMs), it became possible to tackle ordinality through the \textbf{implicit} semantics of the labels as well. This paper provides a comprehensive theoretical and empirical examination of both these approaches. Furthermore, we also offer strategic recommendations regarding the most effective approach to adopt based on specific settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Christopher Beckham and Christopher Pal. 2017. Unimodal probability distributions for deep ordinal classification. In International Conference on Machine Learning, pages 411–419. PMLR.
  2. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Computational Linguistics.
  3. Stephen P Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press.
  4. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  5. Rank consistent ordinal regression for neural networks with application to age estimation. Pattern Recognition Letters, 140:325–331.
  6. A simple log-based loss function for ordinal text classification. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4604–4609.
  7. The unimodal model for the classification of ordinal data. Neural Networks, 21(1):78–91.
  8. Sentiment analysis based on deep learning: A comparative study. Electronics, 9(3):483.
  9. Weighted kappa loss function for multi-class classification of ordinal data in deep learning. Pattern Recognition Letters.
  10. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  11. Gradient descent finds global minima of deep neural networks. In International conference on machine learning, pages 1675–1685. PMLR.
  12. Raúl Díaz and Amit Marathe. 2019. Soft labels for ordinal regression. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4733–4742.
  13. Tilmann Gneiting and Adrian E Raftery. 2007. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378.
  14. Maria Iannario and Domenico Piccolo. 2011. Cub models: Statistical methods and empirical evidence. Modern Analysis of Customer Surveys: with applications using R, pages 231–258.
  15. TinyBERT: Distilling BERT for natural language understanding. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4163–4174, Online. Association for Computational Linguistics.
  16. A good prompt is worth millions of parameters: Low-resource prompt-based learning for vision-language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2763–2775, Dublin, Ireland. Association for Computational Linguistics.
  17. Every local minimum value is the global minimum value of induced model in nonconvex machine learning. Neural Computation, 31(12):2293–2323.
  18. The multilingual Amazon reviews corpus. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4563–4568, Online. Association for Computational Linguistics.
  19. Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with dirichlet calibration. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
  20. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30.
  21. Zefang Liu. 2020. Yelp review rating prediction: Machine learning and deep learning models.
  22. Non-asymptotic convergence bounds for wasserstein approximation using point clouds. Advances in Neural Information Processing Systems, 34:12810–12821.
  23. Edgar C Merkle and Mark Steyvers. 2013. Choosing a strictly proper scoring rule. Decision Analysis, 10(4):292–304.
  24. On robustness of finetuned transformer-based nlp models. arXiv preprint arXiv:2305.14453.
  25. Mert Pilanci and Tolga Ergen. 2020. Neural networks are convex regularizers: Exact polynomial-time convex optimization formulations for two-layer networks. In International Conference on Machine Learning, pages 7695–7705. PMLR.
  26. Improving language understanding by generative pre-training. OpenAI.
  27. Language models are unsupervised multitask learners.
  28. The earth mover’s distance as a metric for image retrieval. International journal of computer vision, 40(2):99.
  29. Age group classification and gender recognition from speech with temporal convolutional neural networks. Multimedia Tools and Applications, 81(3):3535–3552.
  30. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
  31. Llama: Open and efficient foundation language models. Cite arxiv:2302.13971.
  32. Entailment as few-shot learner.
  33. Albert Webson and Ellie Pavlick. 2022. Do prompt-based models really understand the meaning of their prompts? In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2300–2344, Seattle, United States. Association for Computational Linguistics.
  34. Stephan Wojtowytsch. 2023. Stochastic gradient descent with noise of machine learning type part i: Discrete time analysis. Journal of Nonlinear Science, 33(3):45.
  35. Ryoya Yamasaki. 2022. Unimodal likelihood models for ordinal data. Transactions on Machine Learning Research.
  36. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199.
  37. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12697–12706. PMLR.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.