Iterative Data Generation with Large Language Models for Aspect-based Sentiment Analysis

Published 29 Jun 2024 in cs.CL | (2407.00341v2)

Abstract: Aspect-based Sentiment Analysis (ABSA) is an important sentiment analysis task, which aims to determine the sentiment polarity towards an aspect in a sentence. Due to the expensive and limited labeled data, data generation (DG) has become the standard for improving the performance of ABSA. However, current DG methods usually have some shortcomings: 1) poor fluency and coherence, 2) lack of diversity of generated data, and 3) reliance on some existing labeled data, hindering its applications in real-world scenarios. With the advancement of LLMs, LLM-based DG has the potential to solve the above issues. Unfortunately, directly prompting LLMs struggles to generate the desired pseudo-label ABSA data, as LLMs are prone to hallucinations, leading to undesired data generation. To this end, we propose a systematic Iterative Data Generation framework, namely IDG, to boost the performance of ABSA. The core of IDG is to make full use of the powerful abilities (i.e., instruction-following, in-context learning and self-reflection) of LLMs to iteratively generate more fluent and diverse pseudo-label data, starting from an unsupervised sentence corpus. Specifically, IDG designs a novel iterative data generation mechanism and a self-reflection data filtering module to tackle the challenges of unexpected data generation caused by hallucinations. Extensive experiments on four widely-used ABSA benchmarks show that IDG brings consistent and significant performance gains among five baseline ABSA models. More encouragingly, the synthetic data generated by IDG can achieve comparable or even better performance against the manually annotated data.