Retrospective SAM in VR: Affective Assessment

Updated 18 February 2026

Retrospective SAM is a post-stimulus, pictogram-based method that quantifies affective valence and arousal using a 9-point Likert scale in VR research.
Its implementation in VR employs head-mounted displays, a floating interface, and rigorous protocol control to ensure data reliability and cross-cultural clarity.
Statistical analyses reveal significant recall biases in valence due to the peak-end rule, while arousal ratings remain robust compared to continuous measures.

Retrospective Self-Assessment Manikin (Retrospective SAM) constitutes a post-stimulus, pictogram-based, 9-point Likert assessment of the self-reported affective state, operationalized within affective computing and virtual reality (VR) experimental paradigms. Designed to capture subjective valence (pleasant–unpleasant) and arousal (calm–excited) immediately after exposure to emotionally evocative stimuli, Retrospective SAM leverages the canonical Self-Assessment Manikin (SAM) pictograms to optimize clarity and cross-linguistic applicability. A prominent implementation is documented in a remote VR emotion elicitation study, employing digitized Retrospective SAM interfaces and rigorous protocol control in home environments (Warsinke et al., 2024).

1. Protocol and Interface Implementation

Retrospective SAM was deployed in a VR experiment where participants, equipped with head-mounted displays (HMDs; Meta Quest), viewed sequences of 60-second 360° emotional stimulus videos. Immediately upon video completion, a floating interface displayed in VR prompted participants to rate the just-experienced affective state on two successive 9-point Likert scales: first for valence, then for arousal. Each Likert scale was visually anchored by the established SAM pictograms—five schematic manikins spanning the relevant emotional spectrum. Selection occurred via VR controller gaze-and-click; submission of a rating was mandatory before progression.

Prior to the experimental block, all participants underwent a structured practice session. This included hands-on interface calibration, demonstration clips, and explicit on-screen and verbal introduction of the circumplex model of affect and SAM iconography. The rating procedure during the experimental session consisted of simply responding to the post-video prompt: “After the video ends, rate how you felt by selecting a value from the manikin scale for valence, then arousal” (Warsinke et al., 2024).

2. Data Processing and Analytical Design

Raw Retrospective SAM outputs comprised valence $V_{\mathrm{Retro}}\in\{1,\dots,9\}$ and arousal $A_{\mathrm{Retro}}\in\{1,\dots,9\}$ immediately after each video. These were matched against continuous in-experience ratings, which were originally gathered on $[-50,50]$ axes, then linearly rescaled to the $[1,9]$ range using $x' = 1 + 8\,\frac{x + 50}{100}$ . Per-stimulus continuous valence and arousal were computed as the mean of up to five time-sampled values, provided the participant furnished non-missing ratings.

For missing data, Retrospective SAM was immune (forced submission), whereas continuous in-experience data accepted missingness by omission from averaging. The within-subjects design (20 participants × 8 videos) yielded $n=160$ paired data points per affective measure.

3. Statistical Testing and Results

Video stimuli were classified according to the circumplex model into four quadrants: Low Valence–High Arousal (LVHA), High Valence–High Arousal (HVHA), Low Valence–Low Arousal (LVLA), and High Valence–Low Arousal (HVLA). Normative distribution and homogeneity of variance assessments (Shapiro–Wilk, Levene’s test) determined subsequent test selection: paired $t$ -tests (with Cohen’s $d$ effect size) or Wilcoxon signed-rank tests (with effect size $r$ ).

Retrospective SAM Ratings by Quadrant

Quadrant	Valence Mean ± SD	Arousal Mean ± SD
LVHA	4.00 ± 1.81	5.80 ± 1.99
HVHA	6.25 ± 1.77	5.20 ± 2.31
LVLA	3.52 ± 1.19	4.15 ± 1.81
HVLA	6.80 ± 1.94	3.90 ± 2.02

Significant differences between Retrospective SAM and continuous ratings were observed primarily for valence. For example, HVHA videos exhibited a moderate negative effect (Wilcoxon $z=-2.61$ , $p=0.009$ , $r=-0.41$ ), and LVLA videos showed a large effect (paired $t=+3.84$ , $p=0.001$ , $d=1.11$ ). In contrast, arousal did not yield significant discrepancies between Retrospective SAM and continuous sampling across any quadrant, with all $p > 0.15$ and effect sizes in the low to moderate range.

4. Retrospective Biases and the Peak-End Rule

Valence ratings provided via Retrospective SAM demonstrated systematic biases consistent with the psychological "peak-end rule." High-valence videos (HVHA, HVLA) were rated more positively post hoc than during continuous reporting ( $r<0$ in Wilcoxon tests), indicating overestimation of peak pleasantness. Conversely, low-valence stimuli received more negative retrospective ratings ( $d>0$ ), signifying a negative peak-end distortion. These effects resulted in Retrospective SAM valence scores exhibiting greater dispersion relative to continuously collected data, which remained more centered around neutrality.

Arousal ratings, however, did not manifest significant peak-end or recall biases. Quadrant-specific analyses failed to uncover any significant over- or under-estimation in retrospective arousal. There was a minor, non-significant tendency toward lower retrospective reports, but effect sizes ( $d>0$ ) remained modest (Warsinke et al., 2024).

5. Validation, Reliability, and Methodological Considerations

Retrospective SAM valence ratings were found to systematically diverge from continuous measures (moderate to large effect sizes in three of four quadrants), corroborating the presence of recall biases. Nevertheless, retrospective arousal assessments appeared robust against such biases. Correlations between Retrospective SAM and baseline questionnaire measures were strong: $r=0.816$ for valence ( $p<0.001$ ) and $r=0.668$ for arousal ( $p=0.003$ ), supporting convergent validity despite acknowledged dispersion and recall artifacts for valence.

Remote administration of Retrospective SAM, including technical and user-instruction controls, was demonstrated to be feasible and effective. Data quality achieved in home deployments paralleled comparable lab-based studies, though with acknowledgment of additional variance arising from hardware and environmental factors.

6. Practical Guidance and Best Practices

The documented implementation provides several practical recommendations for researchers:

Always conduct a practice trial to familiarize participants with SAM pictograms and the 9-point scale functionality, especially in remote or unsupervised contexts.
Anticipate that valence ratings via Retrospective SAM will exaggerate pleasant or unpleasant experiential peaks; when temporal precision in emotional reporting is critical, supplement with in-experience continuous ratings.
For meaningful cross-modal comparisons, employ linear scaling between retrospective (1–9) and continuous rating scales.
Prior to inferential testing, confirm normality and variance assumptions to justify the choice between paired $t$ -tests and Wilcoxon tests, and transparently report both test statistics and effect sizes.
In field or home environments, standardize device hardware and leverage real-time technical supervision (e.g., video conferencing) to ensure consistent experimental delivery and mitigate technical disruptions.

A plausible implication is that for constructs heavily influenced by recall biases, especially valence, Retrospective SAM should be complemented with methods less susceptible to temporal distortion if granular affective state tracking is required. In contrast, arousal measurements via Retrospective SAM are relatively robust and convergent with baseline measures, justifying their continued use in both laboratory and remote VR research contexts (Warsinke et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Comparing Continuous and Retrospective Emotion Ratings in Remote VR Study (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Retrospective SAM.

Retrospective SAM in VR: Affective Assessment

1. Protocol and Interface Implementation

2. Data Processing and Analytical Design

3. Statistical Testing and Results

Retrospective SAM Ratings by Quadrant

4. Retrospective Biases and the Peak-End Rule

5. Validation, Reliability, and Methodological Considerations

6. Practical Guidance and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Retrospective SAM in VR: Affective Assessment

1. Protocol and Interface Implementation

2. Data Processing and Analytical Design

3. Statistical Testing and Results

Retrospective SAM Ratings by Quadrant

4. Retrospective Biases and the Peak-End Rule

5. Validation, Reliability, and Methodological Considerations

6. Practical Guidance and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research