Voluntary-imagined Object Presence Evaluation (VOPE)
- VOPE is a formal protocol that distinguishes internally generated object imagery from external perception using neural and computational techniques.
- It integrates fMRI, EEG, and LVLM methodologies to decode and validate imaginative object representations with specific statistical models and detection thresholds.
- VOPE offers actionable insights for differentiating true imaginative content from hallucinations, driving advances in both neuroscience research and AI evaluation.
Voluntary-imagined Object Presence Evaluation (VOPE) is a formal protocol and set of computational techniques for determining whether a biological or artificial agent is generating and/or recognizing the presence of objects purely through voluntary imagination, rather than via direct sensory input. The VOPE paradigm is used both in human neuroscience—with fMRI and EEG-based techniques to classify neural representations of imagined vs. perceived objects—and in large vision-LLM (LVLM) assessment, where it distinguishes true imaginative output from hallucination or factual description. VOPE specifies empirical task designs, statistical models, presence/absence decision rules, and performance metrics for systematically verifying that internally generated content corresponds to the intended or referenced object.
1. Conceptual Foundations and Task Definitions
The core premise of VOPE is to operationalize and quantify internally generated object representations, differentiating voluntary imaginative processes from perception and from unintentional hallucination. In neuroscience, VOPE trials distinguish between physically presented stimuli and imagination-induced neural activations, typically via cues prompting voluntary imagery in the absence of external input. In LVLM evaluation, VOPE measures whether a model can self-identify the presence of objects it invents in response to generative tasks beyond pure image description—for example, whether objects mentioned in a story or reasoning output are claimed to be visually present by the model itself (Long et al., 17 Nov 2025).
In both domains, the essence of VOPE is twofold:
- Determining whether the internal state (neural or computational) encodes an intended object without overt sensory evidence.
- Distinguishing imaginative content that is self-aware ("knows" the object is absent) from hallucination or inadvertent fabrication.
2. VOPE in Human Neuroscience: fMRI and EEG Protocols
Experimental Workflow
fMRI-Based VOPE (Miyapuram et al., 2021, Horikawa et al., 2015)
- Stimuli: Paired trials with visual presentation (direct viewing of objects) and imagery (subject imagines the object after a cue).
- Trial Structure: Each trial consists of a fixation, a stimulus cue (e.g., colored shape associated with reward or no-reward), and either image presentation or imagery, followed by a forced-choice response.
- Imaging: Functional MRI acquisition (e.g., 3 T EPI sequences, typical spatial resolutions).
- Preprocessing: Motion correction, spatial normalization, smoothing, high-pass filtering.
- Region of Interest (ROI): Anatomical localization (e.g., substantia nigra, ventral tegmental area), with ROI extracted based on GLM contrasts for physical and imagined stimulation.
EEG-Based VOPE (Robinson et al., 2020)
- Stimuli: Discrete object positions (e.g., small disk at one of six polar coordinates).
- Trial Structure: Pattern-estimator runs (randomized visible stimuli for decoder training) and VOPE tracking runs (object becomes occluded, participant imagines tracking its position).
- EEG Setup: 64-channel system, high sampling rate, FIR filtering.
- Artifact Control: Exclude participants/trials based on EOG signals; trial balancing and channel interpolation for noise control.
Decoding and Evaluation
- Feature Construction: Extract ROI beta values (fMRI) or time-resolved channel voltages (EEG); feature vectors for classification or encoding.
- Classifier/Decoder: Linear SVM (fMRI, reward vs. no-reward object), LDA (EEG, spatial positions); encoders/decoders trained on perception data and tested on imagery.
- Presence Criterion: For fMRI/EEG, declare object presence if classifier output or encoding amplitude for the target exceeds an empirically set threshold. For example, AUC = 0.78 for midbrain features in reward imagery (Miyapuram et al., 2021); single-trial detection based on probability or channel amplitude thresholds in EEG (Robinson et al., 2020).
- Metrics: Accuracy, confusion matrix, ROC curve, and statistical benchmarks (e.g., χ2 tests, Bayes factors).
Key Findings and Implications
- Robust decoding of imagined objects is achievable with accuracy exceeding chance, even on novel (unseen) object categories (Horikawa et al., 2015).
- Shared neural codes and top-down recruitment in visual cortex and midbrain support both perception and voluntary imagery (Miyapuram et al., 2021, Horikawa et al., 2015).
- Anticipatory and temporally variable EEG markers of imagined position tracking, distinguishing generative imagery from mere attention (Robinson et al., 2020).
3. VOPE in LVLMs: Imagination, Hallucination, and Self-Awareness
Formal Definition and Decision Criteria
- Presence-Evaluation Protocol: After the model generates a response (e.g., a story, explanation, or reasoning chain), extract all object mentions in the output. For each, issue a follow-up prompt ("Does the object X appear in the image?") to the same model. The binary answer defines the model’s internal belief about object presence (Long et al., 17 Nov 2025).
- Ground-Truth Comparison: Cross-reference the model's belief with human or detection-based label for each object in the original image.
- Categorization: Each object mention is classified into one of four sets: true description (DT), hallucination in description (DH), true imagination (IT), or hallucination in imagination (IH).
Algorithmic Outline
1 2 3 4 5 6 7 8 9 |
for o in extract_objects(y): h = parse_binary(model(image, "Does o appear?")) p = ground_truth_presence(o, image) if h == 1: if p == 1: DT.append(o) else: DH.append(o) else: if p == 0: IT.append(o) else: IH.append(o) |
Quantitative Metrics
- Hal-D: Hallucination rate (false positive) in factual description,
- Hal-I: Hallucination rate in imaginative content,
- Exp: Expressive tendency,
- CHAIR: Conventional metric that erroneously conflates true imagination with hallucination,
VOPE uniquely avoids penalizing true imagination as hallucination.
Empirical Results
- All tested LVLMs exhibit substantially higher Hal-I in imaginative tasks (writing/reasoning) than Hal-D in purely factual tasks, confirming a dissociation between factual accuracy and imagination self-awareness.
- Reducing the expressivity (Exp) via contrastive decoding modulates the amount of imaginative content without reducing imaginative hallucination rates (Hal-I).
- Existing hallucination mitigation strategies lower Hal-D but are largely ineffective on Hal-I, and sometimes increase it (Long et al., 17 Nov 2025).
Table: LVLM Hallucination and Imaginative Tendency (select data from (Long et al., 17 Nov 2025))
| Task | Model | Hal-D (%) | Hal-I (%) | Exp (%) |
|---|---|---|---|---|
| Captioning | LLaVA1.5 | 21.2 | 19.3 | 13.6 |
| Qwen2.5-VL | 7.3 | 43.6 | 15.2 | |
| Reasoning | Qwen2.5-VL | 5.2 | 32.8 | 23.4 |
| Writing | Qwen2.5-VL | 7.3 | 35.8 | 30.2 |
| Gemini2.0 | 17.2 | 21.1 | 27.4 |
4. Implementation Methodologies Across Modalities
Human Decoding (fMRI/EEG)
- fMRI: Linear SVM classification on midbrain ROI activation patterns or sparse regression decoding from fMRI voxels to visual feature space (e.g., mid-level CNN units (Horikawa et al., 2015)), thresholded by accuracy or predicted-feature correlation.
- EEG: LDA or forward encoding models, with amplitude and decoding-statistic thresholds set to regulate false positive rates on baseline intervals (Robinson et al., 2020).
- Performance Validation: ROC curves, AUC scores; single-subject and group-level statistical benchmarks; behavioral correlation as secondary evidence (e.g., imagery vividness rating).
LVLMs
- Natural Language Processing: Noun-chunk extraction for object mentions; simple “yes/no” parsing for binary presence belief; candidate categories matched with human/object detection labelling.
- Automated Pipeline: All stages (extraction, rechecking, comparison, categorization, and scoring) are automatable; custom prompts and parser rules per model.
Real-Time/Closed-Loop Application
- Brain-Computer Interface (BCI) Applications: Real-time fMRI with GPU-accelerated preprocessing and feature decoding; closed-loop paradigms for subject feedback or model updating (Horikawa et al., 2015).
- Adaptive Thresholding: Calibration trials to individualize threshold settings for optimum sensitivity/specificity.
5. Practical Considerations, Limitations, and Future Extensions
- Neural Domain: Limitations of spatial resolution in fMRI, variability in imagery vividness, sensitivity to baseline attentional state, need for appropriate counterbalancing (e.g., “no-tracking” baseline in EEG).
- LVLM Domain: Current mitigation methods are inadequate for improving Hal-I; expressing imagined content remains difficult for presence-aware self-checking. LVLMs often conflate "relevant" and "present," leading to high Hal-I even with factually accurate description (Long et al., 17 Nov 2025).
- Extension to Multiclass and Generalized Tasks: VOPE criteria can be extended beyond binary decisions to multiclass object representation, sequence and scene imagination, and integration with generative models such as GANs/VAEs for richer feedback (Horikawa et al., 2015).
- Domain Generalization: VOPE is applicable in diverse domains, including imagery for faces, places, abstract objects, and narrative content.
6. Illustrative Examples and Interpretive Context
- In LVLMs, a generated object (e.g., “toy” in a story) which the model admits is not in the image is scored as “true imagination” rather than a hallucination, according to VOPE (Long et al., 17 Nov 2025).
- If the model omits an object present in the image (e.g., “books”), but mentions it as imagined and denies its presence, it is considered a hallucination in imagination (I_H).
- Attention maps in LVLMs show high image-region focus for factual description, but dispersed or low attention for imagined content, diagnosing underlying mechanisms for presence/absence errors.
- In fMRI/EEG paradigms, time-resolved decoding validates the formation of stimulus-like activity during voluntary imagery epochs, even without external input (Miyapuram et al., 2021, Robinson et al., 2020).
7. Significance and Impact
VOPE establishes a principled, quantitative interface between internal object representation—whether biological or artificial—and external validation. In neuroscience, it enables empirical readout of covert imagery and memory, supporting hypotheses of neural code overlap and anticipatory processes. In LVLMs, VOPE clarifies the boundary between creative voluntary imagination and unintentional hallucination, providing precise metrics for evaluating model self-awareness and interpretive reliability. A plausible implication is that further refining VOPE will be central to future research on creativity, truthfulness, and introspective ability in both brain-based and artificial cognition (Long et al., 17 Nov 2025, Miyapuram et al., 2021, Robinson et al., 2020, Horikawa et al., 2015).