PABSA: Hybrid Framework for Persian Aspect-Based Sentiment Analysis

Published 5 Oct 2025 in cs.CL and cs.LG | (2510.04291v1)

Abstract: Sentiment analysis is a key task in NLP, enabling the extraction of meaningful insights from user opinions across various domains. However, performing sentiment analysis in Persian remains challenging due to the scarcity of labeled datasets, limited preprocessing tools, and the lack of high-quality embeddings and feature extraction methods. To address these limitations, we propose a hybrid approach that integrates ML and deep learning (DL) techniques for Persian aspect-based sentiment analysis (ABSA). In particular, we utilize polarity scores from multilingual BERT as additional features and incorporate them into a decision tree classifier, achieving an accuracy of 93.34%-surpassing existing benchmarks on the Pars-ABSA dataset. Additionally, we introduce a Persian synonym and entity dictionary, a novel linguistic resource that supports text augmentation through synonym and named entity replacement. Our results demonstrate the effectiveness of hybrid modeling and feature augmentation in advancing sentiment analysis for low-resource languages such as Persian.

Abstract PDF Upgrade to Chat

Summary

The paper presents a hybrid ML-DL model that integrates multilingual BERT polarity scores with decision tree classifiers.
It achieves state-of-the-art accuracy of 93.34% on the Pars-ABSA dataset, outperforming traditional sentiment models.
The study introduces novel Persian linguistic resources for text augmentation, enhancing feature extraction and classification.

PABSA: Hybrid Framework for Persian Aspect-Based Sentiment Analysis

Introduction

The paper "PABSA: Hybrid Framework for Persian Aspect-Based Sentiment Analysis" (2510.04291) addresses the challenges faced in performing sentiment analysis for the Persian language, a low-resource language with limited labeled datasets and preprocessing tools. Aspect-Based Sentiment Analysis (ABSA) in Persian particularly suffers from scarcity in feature extraction methods and high-quality embeddings. The study proposes a hybrid model integrating ML and deep learning (DL) techniques, incorporating multilingual BERT polarity scores into a decision tree classifier. The result is a significant improvement, achieving an accuracy of 93.34% on the Pars-ABSA dataset. Moreover, novel resources like a Persian synonym and entity dictionary are introduced to facilitate text augmentation, reflecting a comprehensive approach to advancing Persian sentiment analysis.

Methodology

The methodology involves collecting Persian text data from diverse sources such as online reviews, social media platforms, and Persian news articles, with Digikala providing a substantial portion of over 500,000 user reviews. Preprocessing steps include tokenization, stopword removal, and normalization, while ML models like Naïve Bayes and SVM are compared against DL methods such as CNNs, RNNs, and transformers (e.g., BERT, ParsBERT). Feature extraction leverages various embedding techniques, with polarity scores from multilingual models integrated to refine sentiment classification. The selection of a hybrid architecture, combining ML and DL, is pivotal in achieving superior performance and setting new benchmarks on the Pars-ABSA dataset.

Results

The hybrid ML-DL approach outperformed existing benchmarks, demonstrating the effectiveness of combining multilingual BERT polarity scores with decision tree classifiers. The model attained state-of-the-art accuracy, outperforming previous Iranian sentiment analysis models and setting a new standard with a 93.34% accuracy on the Pars-ABSA dataset. An ablation study confirmed the benefits of integrating contextual embeddings with decision tree classifiers, validating the synergy between these methods. The study thoroughly evaluates the model using metrics like accuracy, F1-score, precision, and recall, conclusively showcasing a robust framework for Persian sentiment analysis.

Discussion

The implications of this study are twofold: practical enhancements in Persian NLP and theoretical advancements in hybrid modeling. The integration of multilingual BERT's polarity scores significantly bolsters sentiment classification robustness, proving effective in capturing nuanced sentiment cues within Persian texts. Introducing linguistic resources such as synonym dictionaries and named entity lists facilitates data augmentation, reducing model overfitting and enhancing generalization capabilities. Challenges such as Persian's morphological complexity and limited linguistic resources persist, highlighting areas for future improvement including the development of larger annotated corpora and advanced domain adaptation techniques.

Conclusion

The paper successfully introduces a hybrid framework for Persian ABSA, achieving unprecedented accuracy on the Pars-ABSA dataset. This innovative approach provides valuable resources for further applications in Persian text processing, with potential integration into e-commerce analytics and automated sentiment monitoring systems. Future research should focus on expanding labeled datasets, refining model architectures, and tackling complex linguistic phenomena such as sarcasm and irony to further elevate sentiment analysis capabilities in Persian NLP.

Markdown