Sentiment Analysis of Review Datasets Using Naive Bayes and K-NN Classifier (1610.09982v1)

Published 31 Oct 2016 in cs.IR and cs.CL

Abstract: The advent of Web 2.0 has led to an increase in the amount of sentimental content available in the Web. Such content is often found in social media web sites in the form of movie or product reviews, user comments, testimonials, messages in discussion forums etc. Timely discovery of the sentimental or opinionated web content has a number of advantages, the most important of all being monetization. Understanding of the sentiments of human masses towards different entities and products enables better services for contextual advertisements, recommendation systems and analysis of market trends. The focus of our project is sentiment focussed web crawling framework to facilitate the quick discovery of sentimental contents of movie reviews and hotel reviews and analysis of the same. We use statistical methods to capture elements of subjective style and the sentence polarity. The paper elaborately discusses two supervised machine learning algorithms: K-Nearest Neighbour(K-NN) and Naive Bayes and compares their overall accuracy, precisions as well as recall values. It was seen that in case of movie reviews Naive Bayes gave far better results than K-NN but for hotel reviews these algorithms gave lesser, almost same accuracies.

Authors (5)

Lopamudra Dey (5 papers)
Sanjay Chakraborty (18 papers)
Anuraag Biswas (1 paper)
Beepa Bose (1 paper)
Sweta Tiwari (11 papers)

Citations (264)

View on Semantic Scholar

Summary

Sentiment Analysis of Review Datasets using Naïve Bayes and K-Nearest Neighbour Classifier

This paper investigates the application of two supervised machine learning algorithms, Naïve Bayes and K-Nearest Neighbour (K-NN), in the domain of sentiment analysis, particularly focusing on datasets comprising movie and hotel reviews. As sentiment analysis—often referred to as opinion mining—becomes increasingly pivotal in understanding customer opinions from reviews and other forms of user-generated content, the paper aims to determine the effectiveness of these two algorithms in classifying sentiments as positive or negative.

Methodology

The authors utilize datasets collected from www.imdb.com for movie reviews and from the OpinRank Review Dataset for hotel reviews, each consisting of 5000 positive and 5000 negative reviews. These are analyzed at the phrase level, which involves detecting sentiments within a sentence rather than at the document level. The Naïve Bayes classifier leverages the probabilistic framework based on Bayes' theorem with an assumption of word independence, making it a robust classifier for text categorization. In contrast, K-NN is an instance-based classifier, which defers computation until it encounters a classification task by examining the k nearest instances from the training data.

Results

The experiments highlight a varied performance between the two algorithms depending on the type of review dataset. For movie reviews, the Naïve Bayes classifier consistently outperformed the K-NN classifier, achieving an accuracy exceeding 80% in multiple rounds of experiments. However, in the hotel review dataset, the accuracy of both classifiers was notably lower and did not exhibit significant divergence, suggesting that the semantic and contextual richness of hotel reviews may pose additional challenges not effectively captured by these algorithms.

Implications and Future Directions

The findings underscore the Naïve Bayes classifier's efficacy in handling movie reviews, likely due to its ability to perform well with limited training data. However, the paper highlights the need for more sophisticated algorithms that can better handle varied contextual nuances in hotel reviews. Future research could explore hybrid models that integrate the strengths of both Naïve Bayes and K-NN or alternative classifiers such as Random Forest and Support Vector Machines (SVMs) which may offer improved performance. Furthermore, development of new algorithms combining the merits of existing methodologies could provide enhanced capabilities for accurate sentiment prediction.

While the paper does not address all challenges associated with sentiment analysis, it provides a comparative analysis of simple yet powerful algorithms, thus contributing valuable insights into machine learning applications in natural language processing. As the field progresses, incorporating such insights will be crucial for developing more robust and contextually aware sentiment analysis tools.

PDF Markdown

Related Papers

Find Related Papers