Dynamic Adaptive Attention and Supervised Contrastive Learning: A Novel Hybrid Framework for Text Sentiment Classification

Published 12 Apr 2026 in cs.CL | (2604.10459v1)

Abstract: The exponential growth of user-generated movie reviews on digital platforms has made accurate text sentiment classification a cornerstone task in natural language processing. Traditional models, including standard BERT and recurrent architectures, frequently struggle to capture long-distance semantic dependencies and resolve ambiguous emotional expressions in lengthy review texts. This paper proposes a novel hybrid framework that seamlessly integrates dynamic adaptive multi-head attention with supervised contrastive learning into a BERT-based Transformer encoder. The dynamic adaptive attention module employs a global context pooling vector to dynamically regulate the contribution of each attention head, thereby focusing on critical sentiment-bearing tokens while suppressing noise. Simultaneously, the supervised contrastive learning branch enforces tighter intra-class compactness and larger inter-class separation in the embedding space. Extensive experiments on the IMDB dataset demonstrate that the proposed model achieves competitive performance with an accuracy of 94.67\%, outperforming strong baselines by 1.5--2.5 percentage points. The framework is lightweight, efficient, and readily extensible to other text classification tasks.

Abstract PDF Upgrade to Chat

Authors (1)

Qingyang Li

Summary

The paper introduces a hybrid framework that integrates dynamic adaptive multi-head attention and supervised contrastive learning to capture long-range semantic dependencies in sentiment analysis.
It employs a BERT-based model with frozen initial layers to ensure computational efficiency while achieving 94.67% accuracy and 94.66% F1-score on IMDB reviews.
Ablation studies validate the synergy of both modules, showing significant improvements in F1-score and robustness across varying text lengths.

Dynamic Adaptive Attention and Supervised Contrastive Learning for Robust Sentiment Classification

Introduction

The paper "Dynamic Adaptive Attention and Supervised Contrastive Learning: A Novel Hybrid Framework for Text Sentiment Classification" (2604.10459) introduces a synergistic approach for improving sentiment classification performance on lengthy and nuanced user-generated texts. The authors augment a BERT-based Transformer encoder with two modules: a dynamic adaptive multi-head attention mechanism and a supervised contrastive learning (SCL) branch. This hybrid architecture specifically targets the challenges of capturing long-range semantic dependencies and ambivalent emotional expressions—problems encountered frequently in datasets like IMDB movie reviews. Through rigorous experimentation, the proposed framework demonstrates marked improvements over established baselines such as BERT, RoBERTa, and XLNet.

Methodology

The model architecture is built upon BERT-base, with the first six layers frozen for computational efficiency. Its pipeline comprises input preprocessing, adaptive attention transformation, SCL integration, and supervised classification.

Figure 1: Overall pipeline of the proposed method.

Dynamic Adaptive Multi-Head Attention

A global context pooling vector is mean-pooled from token embeddings and fed into a lightweight MLP regulator, outputting a softmaxed vector of head weights. Each attention head's contribution is multiplicatively adjusted by its respective weight, allowing more salient sentiment-carrying heads to dominate the representation while reducing noise. This module operates in the last six unfrozen Transformer layers and is notably distinct from previous static or rule-based attention approaches.

Supervised Contrastive Learning Branch

Parallel to the classification pathway, the framework projects the [CLS] token embedding into a 256-dimensional space via an MLP. The SCL loss is applied to in-batch positive pairs and all other negatives, encouraging intra-class compactness and inter-class separation in feature space. The SCL branch is trained only during optimization and is integrated as an auxiliary objective, combined with the primary cross-entropy loss.

Joint Optimization

Classification is performed by a linear head on globally pooled last-layer hidden states. The total loss is the sum of classification and SCL losses, weighted by a fixed hyperparameter.

Experimental Results

All experiments were conducted on the IMDB benchmark, with results averaged over five random seeds. The pipeline consistently converges to lower training loss levels and reaches higher validation accuracy faster than competitors.

Figure 2: Overall performance comparison: training loss curves, validation accuracy, confusion matrix, and t-SNE of feature representations.

The proposed model achieves an accuracy of 94.67% and F1-score of 94.66%, outperforming RoBERTa-base and XLNet-base by 1.2–2.5 percentage points. t-SNE visualizations display well-separated clusters, validating the discriminative enhancement provided by SCL. The confusion matrix confirms balanced detection of both sentiment classes.

Ablation studies further quantify the independent and joint contributions of each module.

Figure 3: Ablation study results across classification metrics, training time, and parameter count.

Disabling dynamic adaptive attention or SCL individually leads to significant drops in F1 (~1.6–1.8%), while removing both yields a more pronounced decline. This reflects a clear synergy, not redundancy, between the two mechanisms.

Robustness and Analysis

The model's improvements are evident across all review lengths, notably maintaining high accuracy on long (>300 word) reviews with only a 1.23% drop, where conventional models typically degrade.

Figure 4: Performance breakdown by text length groups (Short: <100 words, Medium: 100–300 words, Long: >300 words).

Hyperparameter analyses show robust stability, with a performance variation of less than 0.82% across learning rates, batch sizes, and head counts. While computational efficiency remains comparable to baselines, the auxiliary branches modestly raise training time and parameter count.

Qualitative error analyses indicate enhanced resolution of subtle sentiment shifts and sarcasm, but sporadic misclassification of highly ambiguous or culturally nuanced inputs, pointing to limitations in external knowledge representation.

Implications and Future Directions

From a theoretical standpoint, the synergistic composition of dynamic adaptive attention and SCL reconfigures Transformer attention distributions and spatial embedding geometry, resulting in improved class separation and contextual sensitivity. Practically, this framework presents a lightweight, scalable solution for large-scale real-world sentiment analysis, especially on platforms with lengthy reviews.

Future work should validate generalization to multi-class, aspect-based, and cross-domain sentiment tasks, and explore integration with multimodal signals. Model compression and out-of-domain generalization are promising research directions for deployment on edge devices and noisy text corpora.

Conclusion

The proposed hybrid framework achieves substantive improvements in sentiment classification accuracy, F1-score, and robustness, especially for challenging long-sequence documents. Its modular adaptive attention and supervised contrastive learning components work in concert to yield better contextualization and more discriminative embeddings than standard BERT and its variants. This research sets a strong precedent for further advances in adaptive representation learning for complex NLP applications, encouraging future exploration of domain transfer, multimodal fusion, and efficient deployment strategies.