Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comparative Study on TF-IDF feature Weighting Method and its Analysis using Unstructured Dataset (2308.04037v1)

Published 8 Aug 2023 in cs.CL and cs.LG

Abstract: Text Classification is the process of categorizing text into the relevant categories and its algorithms are at the core of many NLP. Term Frequency-Inverse Document Frequency (TF-IDF) and NLP are the most highly used information retrieval methods in text classification. We have investigated and analyzed the feature weighting method for text classification on unstructured data. The proposed model considered two features N-Grams and TF-IDF on the IMDB movie reviews and Amazon Alexa reviews dataset for sentiment analysis. Then we have used the state-of-the-art classifier to validate the method i.e., Support Vector Machine (SVM), Logistic Regression, Multinomial Naive Bayes (Multinomial NB), Random Forest, Decision Tree, and k-nearest neighbors (KNN). From those two feature extractions, a significant increase in feature extraction with TF-IDF features rather than based on N-Gram. TF-IDF got the maximum accuracy (93.81%), precision (94.20%), recall (93.81%), and F1-score (91.99%) value in Random Forest classifier.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mamata Das (5 papers)
  2. P. J. A. Alphonse (5 papers)
  3. Selvakumar K. (4 papers)
Citations (32)

Summary

We haven't generated a summary for this paper yet.