Valence-Arousal-Dominance (VAD) State
- The VAD state is a three-dimensional emotion model that quantitatively represents affect via valence, arousal, and dominance, enhancing annotation and computational recognition.
- Methods such as direct rating, proxy-based mapping, and regression-based inference ensure reliable VAD quantification with high inter-rater consistency.
- Applications span text, speech, vision, and agent control, where VAD informs systems for nuanced emotion recognition and adaptive human-computer interaction.
The Valence–Arousal–Dominance (VAD) state is a continuous, three-dimensional parameterization of affect proposed and empirically validated in both psychology and affective computing. VAD systems represent each emotion as a point in the real coordinate space defined by valence (the degree of pleasantness or unpleasantness), arousal (the level of activation or deactivation), and dominance (the sense of personal control versus submissiveness). The VAD space has become central to annotation, recognition, and computational modeling of emotion, providing both a theoretically grounded and practical interface between discrete emotion taxonomies and machine learning methods for emotion recognition and affective interaction. Below, the structure, measurement, computational methodologies, and recent applications of VAD states are presented, as well as their statistical robustness and practical impact.
1. Formal Structure and Foundations of the VAD State
The VAD model describes any emotional state as a point in a three-dimensional space:
- Valence (V): The pleasantness or positivity–negativity of the emotion.
- Arousal (A): The activation or intensity of the emotional state.
- Dominance (D): The perceived sense of control, power, or agency.
Quantitatively, VAD coordinates may use integer or real-valued scales (e.g., [1–9], [–1,+1], [0,1], or [–5,+5] as per application context). A canonical mapping for English is in the [–1,+1] interval, where values are derived as the mean of crowd-sourced or expert ratings on Likert or semantic differential scales. Factor-analytic studies (e.g., Russell; Mehrabian; Osgood) established the near-orthogonality and psychological independence of these dimensions (Mohammad, 30 Mar 2025).
Affective lexicons, such as NRC VAD Lexicon v2, encode over 55,000 English terms with VAD triples, empirically achieving split-half reliabilities above 0.95 for each dimension. The technical protocols for crowdsourcing, quality control, and aggregation in such resources are highly standardized (Mohammad, 30 Mar 2025, Mohammad, 25 Nov 2025).
2. Methods for Mapping and Annotating VAD in Human and Machine Data
Annotation of VAD states proceeds via direct human rating, proxy-based methods, or computational inference:
- Direct rating: Participants assign values to each dimension for a target word, utterance, or stimulus. The NRC VAD v2 uses 7-point scales mapped linearly to –1,+1.
- Proxy-based methods: To ease cognitive load, geometric or visual proxies are created by users (e.g., geometric shape animations parameterized by color, size, and motion) and immediately rated by their creators along the VAD axes (Wrobel, 16 Nov 2025). Such proxy-based approaches attenuate modality biases and can robustly map discrete labels to VAD space.
- Statistical aggregation: For large-scale mapping, arithmetic means of self- or annotator-provided VAD ratings across multiple instances are used; sensitivity analyses with outlier removal (z-score based) confirm the stability of aggregate VAD coordinates (Wrobel, 16 Nov 2025).
- Computational inference: Models can estimate VAD for words or sentences through (a) lexicon matching, (b) regression from contextual features or embeddings, or (c) joint learning with classification or regression objectives (Tang et al., 4 Dec 2025, Li et al., 3 Jan 2026, Park et al., 2019).
A major challenge lies in harmonizing discrete emotion datasets (e.g., labeled as "anger", "joy", "sadness") with the continuous VAD model. Robust proxy-based mappings achieve clear separation in VAD space—for example, "anger" aligns with low V, high A, and high D; "sadness" with low V, low A, and low D (Wrobel, 16 Nov 2025).
3. VAD in Machine Learning: Architectures and Losses
In affective computing systems, VAD states are encoded and exploited by both deep and statistical models:
- Regression-based models: VAD is predicted as a triplet using mean squared error or concordance correlation as loss (Cho et al., 26 May 2025, Li et al., 3 Jan 2026). Models may exploit geometric structure by encoding VAD in spherical coordinates with auxiliary region classification to enforce structural consistency (Cho et al., 26 May 2025).
- Disentangled representations: Variational autoencoders (VAEs) can allocate separate latent channels for each of V, A, and D, regularized via supervised lexicon-alignment and mutual information minimization. The latent variables are trained to be independent and ideally correspond directly to the psychological factors (Yang et al., 2023, Xu et al., 26 Feb 2025).
- Multi-task training: Systems may predict both discrete emotions and continuous VAD, often sharing early feature representations but employing parallel heads for each task. Earth Mover’s Distance and other distributional losses ensure good ordinal modeling of VAD variables even when direct regression labels are absent (Park et al., 2019).
- Affective state dynamics: Long-horizon agents manage explicit VAD states governed by first- or second-order update rules (e.g., exponential smoothing, momentum) and inject the resulting affect vector into generation loops to produce temporally coherent behavior (Subaharan, 22 Jan 2026).
Architectural innovations include dynamic loss weighting, auxiliary region classification, and the use of appraisal-atom verifiers inspired by cognitive appraisal theory (guiding for goal attainment, controllability, certainty, fairness) (Li et al., 3 Jan 2026).
4. Empirical Validation, Calibration, and Consistency
VAD-annotated resources and models are validated on both human and computational benchmarks:
- Inter-rater reliability: Annotated lexicons report split-half reliabilities (Pearson/Spearman) above 0.95 (Mohammad, 25 Nov 2025, Mohammad, 30 Mar 2025).
- Sensitivity analysis: Outlier handling (removing entries by z-score) does not alter the centroid positions of emotions in the VAD space, affirming the robustness of aggregate mappings (Wrobel, 16 Nov 2025).
- Performance on benchmarks: Multimodal, contextually rich models (e.g., E3AD's VAD estimator in autonomous driving) achieve Spearman’s ρ of 0.95, with ablation studies confirming the necessity of VAD-aware modules for both emotion estimation and downstream planning tasks (Tang et al., 4 Dec 2025).
- Domain transfer: Lexicon-weakly supervised models retain high VAD prediction fidelity (1 – RMSE ≈ 0.94 in-domain; ≈ 0.81 cross-domain) (Li et al., 3 Jan 2026).
- Model ablation: Dropping VAD-alignment, auxiliary losses, or appraisal regularization consistently reduces performance, demonstrating their centrality in accurate emotion modeling (Cho et al., 26 May 2025, Li et al., 3 Jan 2026).
5. Fusion of Discrete and Continuous Emotion Models
Bridging categorical and dimensional representations has enabled more flexible and nuanced affective systems:
- Proxy-based mapping: Geometric animation proxies externalize intuition for discrete labels and anchor them in VAD space, allowing for dataset harmonization without direct cognitive mapping (Wrobel, 16 Nov 2025).
- K-means clustering: To invert between discrete classes and continuous VAD space, cluster centers in ℝ³ are assigned as prototypes for basic emotions, and new observations are assigned to the nearest centroid (Jia et al., 2024).
- Earth Mover’s Distance: Models can be trained to map categorical labels to VAD using distributional losses, allowing categorical corpora to provide supervision for downstream VAD regression (Park et al., 2019).
- Practical conversion: Numerical VAD labels can be mapped to categorical or lexically-binned labels (e.g., "Very Low", "Moderate", "Very High") for human alignment in LLM-controlled generation (Choudhury et al., 14 Mar 2025).
These mappings enable conversion and fusion of multimodal datasets and resources, supporting generalizable training of models across discrete and continuous emotion annotation schemes (Wrobel, 16 Nov 2025, Jia et al., 2024).
6. Applications Across Modalities and Domains
The VAD state is integral to various affective computing modalities:
- Text: Lexicon-based or deep models predict VAD from social media, political discourse, or issue-tracking comments for sentiment analysis, political stance detection, productivity/burnout monitoring, and more (Park et al., 2019, Xu et al., 26 Feb 2025, Mäntylä et al., 2016).
- Speech: Audio-based emotion recognition pipelines regress VAD, sometimes employing spherical region classification as structure-guiding auxiliary losses (Cho et al., 26 May 2025).
- Vision: Visual context (e.g., facial images) is mapped to VAD or compact 3D representations for emotion classification; deep models often recover circumplex boundaries without explicit supervision (Kervadec et al., 2018).
- Multimodal fusion: Video/audio/text fusion models for emotion detection in multimedia corpora leverage VAD as the shared space for alignment and cross-modal learning (Jia et al., 2024).
- Agent control: Autonomous and dialog agents manage internal and chain-of-thought VAD states to modulate behavior, plan actions, and maintain temporal affective coherence (Tang et al., 4 Dec 2025, Subaharan, 22 Jan 2026).
Practical applications include health monitoring, public health campaign framing analysis, digital humanities (emotional arcs in literature), and development of more human-aligned conversational and autonomous systems.
7. Statistical Patterns, Theoretical Alignment, and Limitations
Empirical analyses and theoretical studies provide deeper insight into the VAD state:
- Orthogonality: V, A, and D are only weakly correlated, allowing distinct modeling of pleasure, activation, and control (Mohammad, 30 Mar 2025).
- Emotion boundaries: Data-driven models (e.g., CAKE) recover classic circumplex or spherical layouts, with basic emotions forming contiguous, robust regions in VAD space (Kervadec et al., 2018).
- Annotation bias and reliability: Annotation consistency and the representational granularity of VAD are maximized in multidimensional resources; limited seed sets for historical or cross-linguistic application are inferior to large lexica covering VAD (Hellrich et al., 2018).
- Model limitations: Automated numeric VAD control in text generation is less aligned with human perception than simple lexical or categorical labeling; presenting VAD states in human-friendly lexical bins (e.g., "High Valence") improves quality (Choudhury et al., 14 Mar 2025).
- Generalization: Explicitly managed VAD states enable superior cross-domain and cross-target transfer in stance detection, political discourse, and emotion recognition, particularly when emotional granularity is required (Xu et al., 26 Feb 2025, Li et al., 3 Jan 2026).
The VAD model, while grounded in robust psychometric tradition and now operationalized extensively in computational pipelines, relies on the quality of human annotation and the interpretability of its axes in context-sensitive tasks.
References:
- (Wrobel, 16 Nov 2025): Wrobel, "A Proxy-Based Method for Mapping Discrete Emotions onto VAD model"
- (Tang et al., 4 Dec 2025): "E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving"
- (Li et al., 3 Jan 2026): "EmoLoom-2B: Fast Base-Model Screening for Emotion Classification and VAD with Lexicon-Weak Supervision and KV-Off Evaluation"
- (Asif et al., 2024): "Deep Fuzzy Framework for Emotion Recognition using EEG Signals and Emotion Representation in Type-2 Fuzzy VAD Space"
- (Cho et al., 26 May 2025): "EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary Classification"
- (Park et al., 2019): "Dimensional Emotion Detection from Categorical Emotion"
- (Mohammad, 30 Mar 2025): "NRC VAD Lexicon v2: Norms for Valence, Arousal, and Dominance for over 55k English Terms"
- (Mohammad, 25 Nov 2025): "Breaking Bad: Norms for Valence, Arousal, and Dominance for over 10k English Multiword Expressions"
- (Kervadec et al., 2018): "CAKE: Compact and Accurate K-dimensional representation of Emotion"
- (Choudhury et al., 14 Mar 2025): "GPT's Devastated and LLaMA's Content: Emotion Representation Alignment in LLMs for Keyword-based Generation"
- (Subaharan, 22 Jan 2026): "Controlling Long-Horizon Behavior in LLM Agents with Explicit State Dynamics"
- (Xu et al., 26 Feb 2025): "Disentangled VAD Representations via a Variational Framework for Political Stance Detection"
- (Jia et al., 2024): "Bridging Discrete and Continuous: A Multimodal Strategy for Complex Emotion Detection"
- (Mäntylä et al., 2016): "Mining Valence, Arousal, and Dominance - Possibilities for Detecting Burnout and Productivity?"