Self-Assessment Manikin (SAM) Tool

Updated 10 February 2026

Self-Assessment Manikin (SAM) is a pictorial tool that quantifies subjective emotional states along the dimensions of valence and arousal using a 9-point ordinal scale.
It enables both retrospective single-manikin selection and continuous 2D grid interactions, facilitating real-time tracking of affective responses during immersive experiences.
Empirical validations demonstrate SAM’s reliability and sensitivity, highlighting recall bias in retrospective arousal ratings and robust inter-method correlations.

The Self-Assessment Manikin (SAM) is a pictorial, nonverbal assessment tool for quantifying subjective emotional states along the two principal dimensions of valence and arousal. It enables both retrospective and continuous evaluation of affective responses to stimuli, including immersive media such as 360° video. SAM is based on selecting among schematic manikins that span a 9-point ordinal scale for each dimension. This method delivers reliable, modality-independent quantification suitable for rigorous experimental paradigms in affective computing, media experience, and human–computer interaction research (Voigt-Antons et al., 2020).

1. Structural Design and Scoring of SAM

SAM adopts the pictorial scale originally defined by Bradley and Lang (1994). Each dimension is specified by a row of nine stylized human-like manikins, ranging from extreme negative/low (left or bottom) to extreme positive/high (right or top). In valence, the leftmost manikin depicts “very unhappy,” progressing to “very happy” at the rightmost position. For arousal, the same ordinal mapping extends from “very calm” (bottom) to “very energized” (top).

Two modes of assessment are operationalized:

Retrospective SAM: Following each stimulus, the participant selects one manikin per dimension, corresponding to an integer in $\{1, 2, \dots, 9\}$ (where 1 encodes the lowest—a very negative valence or lowest arousal—and 9 the highest).
Continuous SAM-based Grid: During the ongoing stimulus, participants interact with a 2D orthogonal grid whose x-axis represents valence and y-axis arousal, each discretized to [1,9]. Each click records both dimensions simultaneously as an $(x, y)$ pair.

The continuous clicks may occur at any frequency; all captured $(x, y)$ coordinates for a 60 s video are averaged per dimension, yielding real numbers within $[1, 9]$ . Both methods thereby preserve direct comparability and obviate further normalization (Voigt-Antons et al., 2020).

2. Experimental Implementation and Procedure

In Voigt-Antons et al.’s paradigm, each participant is exposed to eight 60 s 360° videos systematically covering all quadrants of the valence–arousal space: high valence/high arousal, high valence/low arousal, low valence/high arousal, and low valence/low arousal—two videos per quadrant. Presentation factors follow a $2 \times 2$ within-subjects framework: System (head-mounted display [HMD] vs. computer screen) and Rating Method (retrospective vs. continuous SAM).

Instructions are standardized as follows:

Prior to trials, participants review definitions (“valence” as pleasure and “arousal” as energy), then practice in desktop mode.
In retrospective blocks, the instruction is to select, after each video, the single manikin matching how they felt. For HMD–retrospective, participants remove the headset to access the on-screen scale.
In continuous blocks, participants are directed to click the grid “whenever [their] emotion shifts during the video,” allowing unconstrained frequency.

All four blocks (HMD–continuous, HMD–retrospective, Screen–continuous, Screen–retrospective) are fully counterbalanced. In continuous conditions, each click is timestamped, supporting later temporal analysis if required.

3. Data Preprocessing and Analysis

Raw retrospective SAM scores are used directly as reported integers in $[1,9]$ . For the continuous grid, mean values across each video’s clicks provide an averaged, potentially non-integer, value per dimension and participant per stimulus. No additional normalization is performed due to the consistent scaling of both modes.

A three-way repeated-measures ANOVA is applied to both valence and arousal, with factors $\mathrm{Video}_{(8)} \times \mathrm{System}_{(2)} \times \mathrm{Method}_{(2)}$ . The significance threshold is set at $\alpha = 0.05$ ; Greenhouse–Geisser correction addresses sphericity violations, and generalized $\eta_G^2$ is reported for effect size.

Table 1: Key Significant Main Effects in ANOVA

Effect	Dependent	$df_{num}$	$(x, y)$ 0	$(x, y)$ 1	$(x, y)$ 2	$(x, y)$ 3
Video	SAM $(x, y)$ 4	7	13	23.76	<.001	0.65
Video	SAM $(x, y)$ 5	7	13	46.99	<.001	0.78
System	Presence	1	22	6.14	.022	0.22
Method	SAM $(x, y)$ 6	1	13	5.58	.034	0.30

4. Psychometric Properties and Comparative Results

Intra-class correlation coefficients (ICCs, two-way mixed, absolute agreement, average-measures) quantify inter-method reliability:

Valence: ICC = 0.80 ( $(x, y)$ 7), rated “excellent”
Arousal: ICC = 0.673 ( $(x, y)$ 8), rated “good”

Empirical means (M) ± SE for 18 participants:

Condition	Valence (M ± SE)	Arousal (M ± SE)
Screen–Retrospective	5.03 ± 0.16	4.70 ± 0.30
Screen–Continuous	5.09 ± 0.14	5.64 ± 0.29
HMD–Retrospective	5.11 ± 0.17	5.30 ± 0.29
HMD–Continuous	5.11 ± 0.16	5.41 ± 0.27

Valence assessments reveal no significant main effect of presentation system nor an interaction between system and method. For arousal, a significant main effect of rating method is found ( $(x, y)$ 9), with continuous ratings yielding higher values, particularly for the screen condition ( $(x, y)$ 0 on a 9-point scale). System does not show a main effect for arousal.

For presence (as measured by the single-item G1 from IPQ), HMD induces greater sense of presence than screen ( $(x, y)$ 1 vs $(x, y)$ 2; $(x, y)$ 3) (Voigt-Antons et al., 2020).

5. Interpretation and Sources of Measurement Bias

The high reliability indices for both dimensions confirm that retrospective SAM and continuous grid methodologies target the same underlying constructs. However, a systematic underestimation of arousal occurs in retrospective screen-based ratings when compared to in-moment (continuous) responses. This suggests that recall bias—specifically, the tendency for retrospective ratings to underreport peak or sustained arousal—affects the validity of post hoc arousal assessment in non-immersive settings. In HMD presentation, concordance between the two modes increases, likely due to the immersive context’s stabilizing effect on affective recall. Valence ratings do not display the same recall bias, implying greater mnemonic stability for this construct.

6. Applicability in Immersive Media and Experimental Design Implications

SAM is validated for use in both virtual reality (VR) head-mounted and conventional desktop contexts. Its compatibility with both retrospective and continuous reporting enables flexible experimental design. Continuous in-VR SAM assessment affords superior sensitivity to transient affective peaks, particularly relevant in non-immersive conditions where retrospective bias otherwise undermines reliability. A plausible implication is that future quality-of-experience (QoE) studies in immersive media should adopt hybrid or switchable continuous grid approaches to balance attentional demands with ecological validity.

7. Integration with Broader Affective Computing Methodologies

SAM’s direct mapping to the valence–arousal framework positions it as a primary tool for subjective emotional quantification alongside other paradigms such as affective slider scales and verbal reports. Its pictorial, language-independent nature provides advantages for cross-cultural studies and minimizes reliance on linguistic interpretation. Empirical findings support its use in state-of-the-art, real-time affective annotation and in validating system designs aiming for affective responsiveness in immersive environments (Voigt-Antons et al., 2020).

Markdown Report Issue Upgrade to Chat

References (1)

Comparing emotional states induced by 360$^{\circ}$ videos via head-mounted display and computer screen (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Self-Assessment Manikin (SAM).