Fuzzy Modeling in VAD for Emotion Recognition
- Fuzzy Modeling in VAD is a technique that partitions the VAD cube into 27 overlapping fuzzy cuboids using IT2 fuzzy sets to capture uncertainty in self-reported emotion ratings.
- The approach integrates spatial-temporal EEG features with fuzzy VAD representations through deep architectures, resulting in significant accuracy improvements in emotion classification.
- Empirical results show that using fuzzy VAD enhances cross-subject generalization by 5–6% compared to crisp VAD baselines, demonstrating its practical benefits in affective computing.
The fuzzy modeling of emotional states in Valence-Arousal-Dominance (VAD) space addresses key limitations in traditional affective computing: the presence of subjective biases and experiment-specific variability in self-reported emotion ratings. By partitioning the crisp VAD cube into soft, overlapping interval type-2 (IT2) fuzzy sets, this approach yields a generic and flexible framework for robust emotion recognition. Deep architectures integrating fuzzy VAD representations with spatial and temporal EEG features provide substantial accuracy improvements and enhanced generalizability across subjects (Asif et al., 15 Jan 2024).
1. Construction of Interval Type-2 Fuzzy Sets in VAD Space
Fuzzy partitioning of each VAD dimension—Valence, Arousal, and Dominance—is achieved by modeling each axis with three linguistic labels (Low, Medium, High), each represented as an IT2 fuzzy set. An IT2 fuzzy set uses two bounded Gaussian membership functions: the Lower Membership Function (LMF) and the Upper Membership Function (UMF), encapsulating the Footprint of Uncertainty (FoU): The Gaussian parameters (means, variances, cut-offs) for each fuzzy label are empirically specified by the study (see Table I). Each membership degree is defined by (4)–(9) for all dimensions. This “thickens” the partitions and allows soft assignment of points in to one or more fuzzy labels, formalizing inherent uncertainty and subjective variability in ratings.
2. Fuzzy Partitioning of the VAD Cuboid
The original VAD space is the crisp cube , partitioned along each axis into three overlapping fuzzy bands via the IT2 sets. Cross-over points and supports are governed by the nonzero tails of the respective Gaussians. The resulting space is decomposed into fuzzy “cuboids,” each corresponding to a triplet of Low/Med/High status along the V, A, and D axes. In the cuboid lattice model, these 27 fuzzy classes provide auxiliary supervision through a secondary softmax output.
No extra boundary equations are required; overlap regions are implicitly handled by the constructed membership functions.
3. Mapping Self-Reported VAD Scores to Fuzzy Memberships
Given a trial with subject-reported , each is mapped to its six membership degrees (three from LMF, three from UMF) per axis: resulting in an 18-dimensional membership vector. In certain model variants, fuzzy cluster memberships derived from Fuzzy C-Means (FCM) replace direct IT2 memberships, as formalized in equation (14). These fuzzy labels serve as input to the deep recognition framework alongside EEG features.
4. Deep Fuzzy Framework: Architecture and Fusion
4.1 Spatial-Temporal EEG Feature Extraction
EEG signals are represented as stacked Short-Time Fourier Transform (STFT) spectrograms across channels. A spatial module applies two layers of Conv–ReLU–MaxPool (kernel , 32/64 filters, dropout 0.2), then flattens to .
Temporal dependencies are modeled by repeating () and passing the sequence through two stacked LSTM layers ($128$ units, dropout $0.2$). The final hidden state aggregates temporal EEG features.
4.2 Fuzzy Module and Variant Models
The fuzzy block consumes via cascade Dense–ReLU–Dropout layers, outputting either:
- Model 1: 24-way softmax over emotion classes.
- Model 2: FCM-derived memberships , then 24-way softmax.
- Model 3: Dual-output softmax—$27$ cuboids () and $24$ emotions, trained jointly (equation 16).
4.3 Fusion Strategy
The final feature fusion occurs by concatenating (EEG, $128$-dim) and the last dense layer output of the fuzzy block, preceding a joint softmax classification over $24$ emotion classes.
5. Optimization, Training, and Loss Functions
Training employs Adam optimizer (learning rate , batch $32$, $100$ epochs, early stopping). Losses are standard cross-entropy for Model 1/2, and additive cross-entropy for dual outputs in Model 3: with , where: As the IT2 membership functions are fixed and non-differentiable, gradients propagate through subsequent dense layers only, enabling adaptive weighting of VAD dimensions per subject during learning.
6. Comparative Performance and Ablation Analysis
Empirical evaluation on the DENS dataset uses 7-second windows and subject-reported VAD scores. The tested models yield 24-way emotion-recognition accuracy as follows:
| Model | Accuracy (%) | Approach |
|---|---|---|
| IT2 Type-2 MF (Model-1) | 96.09 | Interval Type-2 fuzzy sets |
| Cuboid Lattice (Model-3) | 95.75 | 27-way fuzzy cuboids |
| FCM clusters (Model-2) | 95.31 | Fuzzy C-Means clustering |
Cross-subject generalization accuracy improves with fuzzy modeling by 5–6%; for example, Groups 1 vs 2 attains 78.35% with fuzzy VAD versus 72.97% without. Single-subject ablation studies indicate lower performance for the crisp VAD baseline (95.01%) and omission of VAD input (93.54%), while exclusive use of UMF or LMF (95.82%, 94.65%) is outperformed by the full IT2 construction.
7. Implications and Application Domains
The generic nature of IT2 fuzzy VAD representations enables robust emotion modeling under subjectivity and inter-experimental variability. Joint fuzzy-EEG fusion facilitates improved accuracy and consistent cross-subject transfer, offering practical advantages in affective computing, human-computer interaction, and mental health monitoring. A plausible implication is that uncertainty-enriched emotion modeling extends to contexts where annotation reliability is variable or cross-population adaptation is crucial.
Real-world deployment is supported by empirically robust classification, modular architectural components, and generalizable fuzzy cuboid mappings. The methodology advances the interpretability and stability of emotion recognition in neurocognitive interfaces (Asif et al., 15 Jan 2024).