Prelim Attention Score (PAS) Overview

Updated 21 November 2025

PAS is a metric that aggregates self-attention information to detect object hallucinations in large vision–language models and assess attention in neurodevelopmental disorder evaluations.
In LVLMs, PAS is computed by averaging attention weights on previously generated tokens, correlating strongly with the risk of generating ungrounded descriptions.
For NDD assessment, PAS integrates spatial, temporal, and performance metrics from serious games to provide a composite score that informs intervention readiness.

The Prelim Attention Score (PAS) refers to distinct but related metrics developed for two domains: (1) hallucination detection in large vision–LLMs (LVLMs), and (2) composite attention evaluation in adaptive scoring frameworks for assessing attention in children with neurodevelopmental disorders (NDD) via serious games. In both contexts, PAS aggregates dynamical attention-related information, but the computational substrate and application objectives are fundamentally different.

1. PAS in Large Vision–LLMs (LVLMs)

The PAS for LVLMs quantifies the proportion of self-attention allocated to previously generated output tokens (prelim tokens) at each decoding step during autoregressive generation. Its explicit goal is to detect “object hallucinations,” i.e., when a model generates object mentions not grounded in the provided visual context. The method assumes that excessive reliance on preceding tokens (rather than image tokens) strongly correlates with such hallucinations (Hoang-Xuan et al., 14 Nov 2025).

Theoretical Basis

The PAS formalism is grounded in the mutual information $I(v;Y_k ∣ y_{<k},t)$ between the image $v$ and the $k$ th output token $Y_k$ given the previous outputs and instructions. A token with low mutual information—i.e., generated chiefly from the model’s own previous outputs—signals potential hallucination. Direct computation of $I(v;Y_k ∣ y_{<k},t)$ is intractable due to the need to marginalize over unknown image distributions; PAS circumvents this by leveraging internal self-attention weights.

2. Mathematical Definition and Computation

Let $x=(v,t)$ denote the concatenation of image tokens $v$ (of length $m_v$ ) and instruction tokens $t$ (of length $m_t$ ), with $m = m_v + m_t$ . Output tokens are generated as $y=(y_{m+1},…,y_{n})$ in typical decoder-only architectures.

Let $A^{(l,h)} \in \mathbb{R}^{n \times n}$ denote the attention-weight matrix at layer $l$ and head $h$ . For each generated object token $y_k$ ( $k \geq m+1$ ), the PAS is defined as:

$s_{prel}(y_k) = \frac{1}{H} \sum_{h=1}^H \sum_{j=m+1}^{k-1} A^{(l,h)}(k, j)$

where $H$ is the number of heads, and $j$ runs over all previous output (prelim) tokens. By construction, $s_{prel} + s_{img} + s_{ins} + s_{BOS} = 1$ , corresponding to attention allocated to prelim, image, instruction, and beginning-of-sequence tokens, respectively.

Empirically, $s_{prel}$ shows a strong positive correlation with object hallucination risk across several LVLM architectures.

Algorithmic Implementation

PAS is computed on the fly during decoding using attention weights already produced by a standard forward pass.
For each output step, the attention tensor is sliced to extract relevant weights, summed per head, then averaged.
This incurs negligible additional computation and memory, requiring no extra model passes or storage beyond transient attention matrices.

3. Application to Hallucination Detection

In practice, PAS is thresholded to yield a binary hallucination decision. On multiple datasets (MSCOCO, Pascal VOC) and models (LLaVA-1.5-7B, MiniGPT-4-7B, Shikra-7B), PAS achieves higher AUROC for hallucination detection than baseline techniques, including hidden-state consistency, logit-based uncertainty, and image-attention sums.

Method	Avg AUROC
NLL (logits)	62.2
Entropy	67.4
IC (hidden)	71.9
GLSim	65.6
SVAR (image attn)	80.3
PAS (ours)	85.0

Threshold $\tau \approx 0.4–0.5$ optimizes performance across models (Hoang-Xuan et al., 14 Nov 2025).

PAS is also robust to decoding strategies, showing $\sim 84\%$ AUROC across greedy, beam, top- $k$ and nucleus sampling.

4. Extension and Limitations in Vision–LLMs

PAS is most effective in models with full non-local (global) attention and standard decoder-only Transformer structure. In architectures with sparsified or restricted attention, such as Longformer, PAS may not be computable due to the absence of uniform access to prelim tokens. Additionally, in multi-turn or context-rich scenarios where high prelim attention is legitimate, PAS may produce false positives. Attribute or relational hallucinations (e.g., erroneous colors) are not directly detected; PAS focuses on existence errors.

Suggested extensions include learned head-weighting schemes, PAS–image-attention fusion, and deployment in dynamic or training-time settings.

5. PAS in Adaptive Attention Scoring for NDD Assessment

Independently, PAS is also employed as a composite metric within an adaptive scoring framework for quantifying visual attention in children with NDD playing educational games (Rehman et al., 10 Sep 2025). Here, PAS aggregates spatial, temporal, and game-performance attention subscores, integrating raw eye-tracking data and behavioral signals in real time.

Composite Score Construction

At each session/level $s$ , the PAS is computed as

$PAS_s = F_s = \max\left(0, \min\left(100, S_{base}(s) + \lambda_s I_t \right)\right)$

where

$S_{base}(s) = S_{spatial}(s) + \beta_s \Gamma_s$ combines spatial attention events, area-of-interest (AoI) efficiency, focus duration, and a dynamic performance bonus.
$I_t$ aggregates temporal engagement measures, including normalized durations, sustained engagement, and penalties for excess transitions.
$\lambda_s$ is an adaptive multiplier which downweights $I_t$ for high-performing sessions to prevent over-rewarding spurious temporal fluctuations.

Subscores are meticulously normalized and parameterized by empirical weights, with all key computation steps detailed explicitly in (Rehman et al., 10 Sep 2025).

Validation

The PAS, thus computed, is evaluated against ground-truth game accuracy using standard metrics:

MAE $\approx 6.05$ \%
RMSE $\approx 7.41$ \%
Pearson $r \approx 0.389$
Spearman $\rho \approx 0.50$

Benchmarks for intervention readiness are MAE/RMSE $<10$ \% and correlations $>0.3–0.4$ .

6. Comparative Overview

Domain	PAS Purpose	Mathematical Basis	Key Applications
LVLMs	Hallucination detection	Self-attention sum	Real-time token filtering, fallback triggering, decoding adjustment (Hoang-Xuan et al., 14 Nov 2025)
NDD Assessment	Composite attention	Multimetric weighted sum	Educational assessment, intervention readiness (Rehman et al., 10 Sep 2025)

While PAS in both domains shares an underlying motivation—distilling complex behavioral or internal state signals into a single interpretable metric—the operational definitions and computational machinery are distinct.

7. Future Directions

For LVLMs, potential enhancements include task-adaptive head aggregation, PAS fusion with image-attention and uncertainty measures, dynamic threshold calibration under instruction tuning, and generalization to multi-image or textual document analysis. For adaptive attention frameworks, plausible developments are further behavioral contextualization, refined temporal segmentations, and integration with multimodal learning analytics.

A plausible implication is that PAS, as a paradigm, exemplifies a class of lightweight, interpretable proxies for latent information flow and attention allocation, of use beyond their original domains.

PDF Markdown Chat (Pro)

References (2)

PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models (2025)

An Adaptive Scoring Framework for Attention Assessment in NDD Children via Serious Games (2025)

Follow Topic

Get notified by email when new papers are published related to Prelim Attention Score (PAS).