- The paper presents a hybrid ML-DL model that integrates multilingual BERT polarity scores with decision tree classifiers.
- It achieves state-of-the-art accuracy of 93.34% on the Pars-ABSA dataset, outperforming traditional sentiment models.
- The study introduces novel Persian linguistic resources for text augmentation, enhancing feature extraction and classification.
PABSA: Hybrid Framework for Persian Aspect-Based Sentiment Analysis
Introduction
The paper "PABSA: Hybrid Framework for Persian Aspect-Based Sentiment Analysis" (2510.04291) addresses the challenges faced in performing sentiment analysis for the Persian language, a low-resource language with limited labeled datasets and preprocessing tools. Aspect-Based Sentiment Analysis (ABSA) in Persian particularly suffers from scarcity in feature extraction methods and high-quality embeddings. The study proposes a hybrid model integrating ML and deep learning (DL) techniques, incorporating multilingual BERT polarity scores into a decision tree classifier. The result is a significant improvement, achieving an accuracy of 93.34% on the Pars-ABSA dataset. Moreover, novel resources like a Persian synonym and entity dictionary are introduced to facilitate text augmentation, reflecting a comprehensive approach to advancing Persian sentiment analysis.
Methodology
The methodology involves collecting Persian text data from diverse sources such as online reviews, social media platforms, and Persian news articles, with Digikala providing a substantial portion of over 500,000 user reviews. Preprocessing steps include tokenization, stopword removal, and normalization, while ML models like Naïve Bayes and SVM are compared against DL methods such as CNNs, RNNs, and transformers (e.g., BERT, ParsBERT). Feature extraction leverages various embedding techniques, with polarity scores from multilingual models integrated to refine sentiment classification. The selection of a hybrid architecture, combining ML and DL, is pivotal in achieving superior performance and setting new benchmarks on the Pars-ABSA dataset.
Results
The hybrid ML-DL approach outperformed existing benchmarks, demonstrating the effectiveness of combining multilingual BERT polarity scores with decision tree classifiers. The model attained state-of-the-art accuracy, outperforming previous Iranian sentiment analysis models and setting a new standard with a 93.34% accuracy on the Pars-ABSA dataset. An ablation study confirmed the benefits of integrating contextual embeddings with decision tree classifiers, validating the synergy between these methods. The study thoroughly evaluates the model using metrics like accuracy, F1-score, precision, and recall, conclusively showcasing a robust framework for Persian sentiment analysis.
Discussion
The implications of this study are twofold: practical enhancements in Persian NLP and theoretical advancements in hybrid modeling. The integration of multilingual BERT's polarity scores significantly bolsters sentiment classification robustness, proving effective in capturing nuanced sentiment cues within Persian texts. Introducing linguistic resources such as synonym dictionaries and named entity lists facilitates data augmentation, reducing model overfitting and enhancing generalization capabilities. Challenges such as Persian's morphological complexity and limited linguistic resources persist, highlighting areas for future improvement including the development of larger annotated corpora and advanced domain adaptation techniques.
Conclusion
The paper successfully introduces a hybrid framework for Persian ABSA, achieving unprecedented accuracy on the Pars-ABSA dataset. This innovative approach provides valuable resources for further applications in Persian text processing, with potential integration into e-commerce analytics and automated sentiment monitoring systems. Future research should focus on expanding labeled datasets, refining model architectures, and tackling complex linguistic phenomena such as sarcasm and irony to further elevate sentiment analysis capabilities in Persian NLP.