Aspect-Category Sentiment Analysis
- Aspect-Category Sentiment Analysis (ACSA) is a fine-grained task that identifies discussed aspect categories and predicts the corresponding sentiment polarity.
- The methodology has evolved from rule-based and deep learning models to large language model-based generative and multi-task approaches.
- Recent research integrates emotional cues, multimodal inputs, and continual learning to enhance data efficiency and model robustness.
Aspect-Category Sentiment Analysis (ACSA) is a core fine-grained sentiment analysis task in which the objective is, given an input text (typically a review sentence or document) and a set of predefined aspect categories, to determine both which categories are discussed and, for each category, to predict the corresponding sentiment polarity. The field has undergone significant methodological and empirical evolution over the last decade, transitioning from rule-based pattern matching to deep learning architectures and, most recently, to LLM based generative and multi-task approaches. Recent advancements further enrich ACSA by integrating emotional dimensions, multimodal cues, continual learning, and data-efficient paradigms.
1. Formal Definition and Problem Structure
Given a text and a set of predefined aspect categories , the ACSA task is to output a set , where is the sentiment polarity (often ) for aspect , or if is not discussed in (Zhang et al., 2022). Canonically, this is a multi-label, multi-class classification problem, sometimes decomposed into aspect category detection (ACD) and aspect category sentiment classification (ACSC) (Cui et al., 2024). Extensions to the setting include handling subcategories (e.g., "food#style_options"), document-level ACSA, and multimodal inputs.
2. Modeling Approaches and Architectural Evolution
Classical and Early Deep Models
- Rule-/feature-based classifiers: Early approaches used lexicon-driven rules, one-vs-all SVMs, and dependency-path features (Zhang et al., 2022).
- CNN/LSTM with attention: Neural models encode sentences via BiLSTMs or CNNs, embedding aspect categories as vectors; an attention mechanism produces a category-specific representation, and outputs are classified via softmax (Xue et al., 2018, Li et al., 2019).
- Gating/graph methods: CNNs with gated units, such as the GCAE with Gated Tanh-ReLU Units, efficiently factorize sentiment and aspect cues (Xue et al., 2018). Graph attention networks (GATs) operating over constituency or dependency parses can enhance aspect–opinion alignment (Li et al., 2020).
- Multi-instance multi-label learning (MIMLL): AC-MIMLLN frames a sentence as a "bag" of word instances, with per-aspect attention identifying key instances for aspect sentiment aggregation (Li et al., 2020).
LLMs and Generative Paradigms
- Sequence-to-sequence (seq2seq) generation: By reformulating ACSA as text generation, state-of-the-art performance is achieved with pre-trained models (e.g., BART, T5). A model generates natural-language strings stating aspect-polarity ("The sentiment polarity of price is negative") (Liu et al., 2021). This template-based approach excels in few-shot and zero-shot settings, directly leveraging pretraining signals.
- Instruction tuning and multi-task heads: Fine-tuning LLMs with explicit multi-output patterns ("CATEGORY#POLARITY; CATEGORY#EMOTION") enables joint prediction of sentiment and emotion, or simultaneous ACD and ACSC (Chai et al., 24 Nov 2025, Cui et al., 2024).
Continual and Incremental Learning
- Category Name Embedding and shared decoders: CNE-net structures BERT inputs as [sentence; aspect1; ...; aspectN], sharing encoder and decoder to minimize catastrophic forgetting in incremental category learning. Fine-tuning on new categories does not degrade accuracy on the old (Dai et al., 2020).
- Unified and distant supervision: Distantly supervised architectures (e.g., DSPN) use only document-level star ratings to induce aspect-level sentiment via a hierarchical pyramid, trading annotation intensity for interpretability and efficiency (Li et al., 2023).
Multimodal and Cross-Modal Fusion
- Fine-grained multimodal ACSA: MACSA and ViMACSA benchmarks, with models such as MGAM and FCMF, address settings where both text and image evidence are available. Category-aligned cross-modal graphs and attention mechanisms are used to fuse fine-grained text tokens and detected image regions of interest (RoIs), showing gains especially for implicitly mentioned aspects (Yang et al., 2022, Nguyen et al., 2024).
3. Emotional and Affective Dimensions in ACSA
Traditional ACSA models restrict supervision to coarse sentiment polarities (positive, neutral, negative). Recent work augments these labels with affective signals:
- Joint sentiment–emotion generation: Multi-task ACSA frameworks simultaneously generate both category-sentiment and category-emotion outputs. For each aspect, the framework uses an LLM prompt to predict one of Ekman's six basic emotions (anger, disgust, fear, joy, sadness, surprise, plus neutral) in addition to standard polarity (Chai et al., 24 Nov 2025).
- VAD-space refinement: To ensure emotion labels accurately reflect affective content, an emotion assigned by LLM is projected into Valence–Arousal–Dominance (VAD) space via a DeBERTa model fine-tuned on EmoBank. If there is a conflict between the predicted emotion and its VAD centroid, a further LLM re-annotation for consistency is triggered.
- Empirical effect: Integrating emotion labels in joint supervision improves ACSA F1 (~1–2 points over strong Flan-T5 baselines), and ablation shows emotion and refinement steps are critical (Chai et al., 24 Nov 2025).
Table: Multi-Task Outputs in Emotion-Enhanced ACSA
| Output Type | Label Set | Decoding Format |
|---|---|---|
| Sentiment | {positive, neutral, negative} | CATEGORY#POLARITY |
| Emotion | {anger, disgust, fear, joy, sadness, surprise, neutral} | CATEGORY#EMOTION |
4. Data, Evaluation, and Low-Resource Settings
Datasets
- SemEval 2014–16 series: English Restaurant and Laptop reviews remain the primary benchmarks, with various degrees of aspect granularity (e.g., 5 to 81 categories) (Zhang et al., 2022, Chai et al., 24 Nov 2025).
- ASAP: Large-scale Chinese reviews with up to 18 aspect categories and manual sentiment labeling, enabling robust multi-domain/multilingual studies (Bu et al., 2021).
- MAMS and MACSA: Evaluate models under multi-aspect (with conflicting sentiments) and multimodal conditions (Liu et al., 2021, Yang et al., 2022, Nguyen et al., 2024).
Metrics
- Macro-F1 across all (category, polarity) pairs is the default measure.
- Many works also report precision, recall, micro-F1, strict accuracy, and per-aspect accuracy.
Data Scarcity and Augmentation
- Semantic-preserving augmentation: Automatically generated, semantically consistent paraphrases are added to training via LLM prompting. Consistency is enforced by SBERT-based filtering (cosine similarity ≥ 0.7), yielding substantial F1 improvements (up to +18 points in some low-resource scenarios) (Chai et al., 8 Jun 2025).
- Confidence-weighted fine-tuning: Loss is weighted by the model's own confidence, encouraging the model to focus on high-certainty, correct predictions; this consistently improves performance (Chai et al., 8 Jun 2025).
- Unlabeled and weakly supervised learning: AX-MABSA achieves weak supervision by using only seed words and BERT post-training with contrastive objectives, without any labeled data, though with lower accuracy than full supervision (Kamila et al., 2022). Distant supervision with only star ratings is also explored (Li et al., 2023).
5. Robustness, Error Analysis, and Key Challenges
Multi-Aspect and Contrasting Sentences
- Failure mode: When most training sentences lack contrastive (multi-aspect, multi-polarity) structure, models degenerate to sentence-level classifiers, failing to distinguish aspect-specific sentiments (Xu et al., 2019).
- Adaptive Re-weighting (ARW): Instance re-weighting boosts the importance of rare contrastive sentences in the loss, lifting contrastive-case F1 by 6–10 points without harming global accuracy (Xu et al., 2019).
Sentiment–Aspect Entanglement and Hierarchical Disentanglement
- Hierarchical disentanglement: ECAN explicitly separates the representation spaces for categories and sentiments, allowing different categories’ sentiment cues to be independently extracted, which is crucial in sentences with entangled aspect/sentiment structure (Cui et al., 2024).
Error Types and Limitations
- Category error dominance: Misattribution of categories (not sentiments) is the main error source, especially in domains with fine-grained, overlapping categories (Chai et al., 8 Jun 2025).
- Implicit and low-resource aspects: Models struggle with aspects not explicitly mentioned and rare categories (Cui et al., 2024).
- (Multi)modality noise: In multimodal ACSA, irrelevant or low-quality images complicate fusion (Nguyen et al., 2024).
6. Extensions: Incremental, Multilingual, Multimodal, and Unified ACSA
- Incremental learning: Shared-encoder/decoder models with dynamic input construction facilitate the introduction of new aspect categories without catastrophic forgetting (Dai et al., 2020).
- Cross-domain and multilingual: Adversarial feature alignment, pseudo-label bootstrapping, and mBERT-based zero-shot transfer are proven for non-English and domain adaptation (Zhang et al., 2022).
- Multimodal ACSA: Text–image fusion with fine-grained region–aspect alignment—implemented via cross-modal attention and GCNs—is empirically validated to surpass text-only and coarse fusion models (Yang et al., 2022, Nguyen et al., 2024).
- Unified architectures: Multi-task learning with joint losses or pyramid representations now support simultaneous ACD, ACSA, and review rating prediction (DSPN, ECAN, ASAP joint models) (Li et al., 2023, Bu et al., 2021).
7. Open Challenges and Future Directions
- Expanding emotion and affect: Current multi-task frameworks use only Ekman’s six basic emotions plus neutral; future models may benefit from more nuanced or culture-aware emotional taxonomies. End-to-end VAD embedding learning and multimodal affect integration remain open (Chai et al., 24 Nov 2025).
- Scalability and cost: Reliance on external LLMs for emotion annotation/refinement imposes cost at inference; efficient in-model alternatives are needed (Chai et al., 24 Nov 2025).
- Generalization: Cross-domain, cross-language, and multimodal transfer methods require further development, as does robustness to adversarial and noisy data (Zhang et al., 2022).
- Few-shot/zero-shot learning: Prompt-tuning and seed-based weak supervision (AX-MABSA, zero-shot prompt transfer) remain active research areas (Kamila et al., 2022, Dai et al., 2020).
- Unified and lifelong learning: Continued progress toward fully unified architectures capable of seamlessly absorbing new aspects, domains, and modalities without retraining is highlighted as a long-range goal (Zhang et al., 2022).
In summary, ACSA has matured from category-centric classification with explicit aspect mentions to flexible, emotionally enriched, cross-modal, and data-efficient paradigms. Fundamental to this evolution are innovations in architecture, supervision (multi-task, weak, and distant), and evaluation. Open problems linger at the intersection of emotional nuance, generalization, efficiency, and unified learning frameworks. The integration of emotion-enhanced, multimodal, and lifelong capabilities defines the current research frontier (Chai et al., 24 Nov 2025, Nguyen et al., 2024, Cui et al., 2024, Chai et al., 8 Jun 2025, Dai et al., 2020, Zhang et al., 2022).