Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comparing BERT against traditional machine learning text classification (2005.13012v2)

Published 26 May 2020 in cs.CL, cs.LG, and stat.ML

Abstract: The BERT model has arisen as a popular state-of-the-art machine learning model in the recent years that is able to cope with multiple NLP tasks such as supervised text classification without human supervision. Its flexibility to cope with any type of corpus delivering great results has make this approach very popular not only in academia but also in the industry. Although, there are lots of different approaches that have been used throughout the years with success. In this work, we first present BERT and include a little review on classical NLP approaches. Then, we empirically test with a suite of experiments dealing different scenarios the behaviour of BERT against the traditional TF-IDF vocabulary fed to machine learning algorithms. Our purpose of this work is to add empirical evidence to support or refuse the use of BERT as a default on NLP tasks. Experiments show the superiority of BERT and its independence of features of the NLP problem such as the language of the text adding empirical evidence to use BERT as a default technique to be used in NLP problems.

Citations (207)

Summary

  • The paper demonstrates that BERT significantly outperforms traditional ML methods, achieving accuracies up to 93.87% in text classification tasks.
  • The methodology involves four experimental setups, including IMDB sentiment analysis, disaster tweet classification, Portuguese news categorization, and Chinese hotel review sentiment analysis.
  • The study highlights BERT’s advantages in transfer learning, multilingual adaptability, and ease of implementation, suggesting a paradigm shift in NLP approaches.

Comparative Analysis of BERT and Traditional Machine Learning Techniques for Text Classification

This paper by Santiago Gonzalez-Carvajal and Eduardo C. Garrido-Merchan presents a comprehensive evaluation of Bidirectional Encoder Representations from Transformers (BERT) with traditional machine learning paradigms in the field of text classification. The principal aim of this paper is to provide empirical evidence supporting the application of BERT as a default methodology for NLP tasks, challenging the classical approaches that have historically utilized features like TF-IDF.

Introduction to Methodologies

The exploration begins with a delineation of traditional NLP techniques, largely dominated by Machine Learning (ML) models that leverage TF-IDF for feature extraction. These classical methods are juxtaposed against BERT—a recently developed deep learning model that exploits bidirectional encoder representations and fine-tunes on specific NLP tasks post a comprehensive pre-training phase involving large, unlabeled text corpora.

Experimental Framework and Results

The researchers designed four distinct experimental setups to intricately compare the performance of the BERT model against conventional ML techniques across languages and domains. Predominantly, the results exhibit BERT’s superiority:

  1. IMDB Experiment: Employed for sentiment analysis on movie reviews, BERT achieved an accuracy of 93.87%, outstripping models like Logistic Regression and Linear SVC, which hovered around 89-90%.
  2. RealOrNot Tweets Classification: Focused on distinguishing real disaster-related tweets from otherwise, BERT secured 83.61% accuracy against a stacked ensemble AutoML approach that attained 77.5%.
  3. Portuguese News Categorization: Showcased BERT's multilingual prowess with a 90.93% accuracy on a multi-class news dataset, surpassing the traditional Gradient Boosting classifier which lagged at 84.8%.
  4. Chinese Hotel Reviews Sentiment Analysis: Testing BERT adaptability across different language scripts, the model achieved 93.81% accuracy, markedly ahead of conventional models based on Gradient Boosting.

Implications and Future Directions

The empirical dominance of BERT across varied datasets underscores its value as a robust, flexible, and less labor-intensive alternative to traditional NLP methodologies. Notably, BERT’s reliance on substantial pre-training and transfer learning warrants attention, particularly in environments with limited labeled data.

The findings indicate a shift towards deep learning models in NLP, highlighting critical advancements like transfer learning, which merit further exploration. Future work could delve into enhancing BERT with hyperparameter optimization techniques, such as Bayesian Optimization, to tailor it efficiently for diverse NLP applications. Moreover, leveraging BERT's capabilities for sophisticated language interpretation in AI systems, including robotics, could pave new pathways for integrating NLP in intelligent systems.

Conclusion

Through rigorous analyses and varied experimental conditions, this paper illuminates the competency of BERT, affirming its preeminence over conventional ML models. Given its superior performance and ease of implementation, BERT stands out as an invaluable asset in the standard NLP toolkit, with promising prospects for future enhancement and applications in AI-driven language processing.