W2VLDA: Almost Unsupervised System for Aspect Based Sentiment Analysis (1705.07687v2)

Published 22 May 2017 in cs.CL

Abstract: With the increase of online customer opinions in specialised websites and social networks, the necessity of automatic systems to help to organise and classify customer reviews by domain-specific aspect/categories and sentiment polarity is more important than ever. Supervised approaches to Aspect Based Sentiment Analysis obtain good results for the domain/language their are trained on, but having manually labelled data for training supervised systems for all domains and languages are usually very costly and time consuming. In this work we describe W2VLDA, an almost unsupervised system based on topic modelling, that combined with some other unsupervised methods and a minimal configuration, performs aspect/category classifiation, aspect-terms/opinion-words separation and sentiment polarity classification for any given domain and language. We evaluate the performance of the aspect and sentiment classification in the multilingual SemEval 2016 task 5 (ABSA) dataset. We show competitive results for several languages (English, Spanish, French and Dutch) and domains (hotels, restaurants, electronic-devices).

Citations (188)

View on Semantic Scholar

Summary

The paper introduces W2VLDA, which reduces the need for extensive labeled data by using minimal seed words for configuring aspects and sentiment classes.
Its unsupervised framework integrates topic modeling with word embeddings and Brown clusters to effectively separate aspect terms from opinion words.
Evaluation on multilingual datasets reveals that W2VLDA outperforms traditional LDA methods, demonstrating its scalability and domain adaptability.

Overview of W2VLDA: An Almost Unsupervised System for Aspect-Based Sentiment Analysis

Introduction

The paper introduces W2VLDA, a novel, almost unsupervised system for Aspect-Based Sentiment Analysis (ABSA) that leverages a combination of topic modeling techniques, continuous word embeddings, and minimal initial configuration to classify text into desired domain-specific aspects and sentiment polarities. The necessity for automated sentiment analysis systems arises from the sheer volume of customer reviews proliferating across digital platforms. Traditional supervised approaches, while effective, demand extensive manually labeled datasets, which are costly and impractical across multiple domains and languages. W2VLDA aims to overcome these limitations with reduced supervision, maintaining applicability in multilingual and multi-domain settings.

Methodology

The central innovation of W2VLDA lies in its methodological framework that incorporates different unsupervised techniques leading to a cohesive system capable of performing three simultaneous tasks: aspect classification, sentiment polarity classification, and aspect-term/opinion-word separation.

Topic and Sentiment Configuration: W2VLDA requires a minimal domain-specific configuration setup where a single seed word per aspect and polarity (positive/negative) suffices. This simplicity enables the system’s application across various languages and domains with low adaptation effort.
Aspect-Term and Opinion-Word Separation: The paper details an unsupervised method utilizing Brown clusters to model aspect-terms vs. opinion-words. The separation process does not rely on language-specific resources, fostering the adaptability of the system to any linguistic context.
Topic Modeling Integration: At its core, W2VLDA adopts an LDA-based topic model, enriched with semantic guidance from word embeddings which inform hyperparameter biases during model training. The integration of word similarity measures derived from embeddings ensures the alignment of topics and sentiment modeling towards user-defined goals.

Evaluation and Results

W2VLDA’s efficacy is demonstrated via experiments across several languages and domains as part of the SemEval 2016 Task 5 dataset, amongst other datasets. Key findings from the paper’s experimental results include:

The system outperforms traditional LDA-based methods in aspect and sentiment classification, particularly in cases where minimal supervision is employed.
W2VLDA attains promising results for aspect classification in multilingual scenarios and exhibits strong performance in sentiment classification, matching or exceeding that of more supervised approaches without requiring a large labeled corpus.
A thorough assessment of the seed words' impact on classification demonstrated that while results are consistent regardless of seed choice, combining multiple seeds can optimize semantic coverage and accuracy further.

Implications and Future Directions

The implications of W2VLDA are substantial, particularly in its potential for scaling ABSA to diverse languages and specific business use cases without incurring significant labeling costs. The paper highlights several avenues for future research, such as refining stop-word management, further exploration of specialized sentiment embeddings, and improving the handling of complex linguistic constructs like multi-word expressions and negation.

In conclusion, the development of W2VLDA represents a significant stride towards resource-efficient, scalable sentiment analysis solutions adaptable across languages and sectors. Its reliance on a sparse configuration underlines an important paradigm shift towards minimally supervised learning in natural language processing domains.

PDF Markdown