Reference-Centered Dynamic Scanning

Updated 10 November 2025

Reference-Centered Dynamic Scanning Strategy is defined as a scanning framework where gaze is allocated relative to an initial fixation point, integrating scene saliency with dynamic spatial references.
It employs inhibitory tagging to actively suppress revisits to previously fixated areas, resulting in measurable transients and overshoots in gaze trajectories.
Computational models like SceneWalk validate that incorporating a reference-centered approach enhances the prediction of human scanpaths compared to static density or momentum models.

A reference-centered dynamic scanning strategy is a paradigm for sequential allocation of visual (or sensor) fixations in which each planned gaze position is selected not only according to the underlying scene structure (such as saliency maps) but also relative to an explicit, dynamically-updated spatial reference—typically the initial fixation position. Here, the scanpath is influenced by the spatiotemporal history of gaze, such that the trajectory evolves in coordinates centered on a reference point, and oculomotor planning incorporates mechanisms that discourage refixation of already-visited areas. This conceptual framework is formalized and empirically scrutinized in Rothkegel et al. (2016) (Rothkegel et al., 2016), who quantify and model how experimentally manipulated initial fixations govern the subsequent dynamics of human eye movements in natural scene viewing. Key in this account is the role of inhibitory tagging, whereby previously fixated locations acquire a time-varying suppression in attentional priority, producing characteristic temporal transients and systematic “overshoots” in fixation distributions.

1. Formal Definition and Notation

Let $x_0 \in \mathbb{R}^2$ denote the initial fixation position, which is experimentally controlled, and let $x_t \in \mathbb{R}^2$ be the two-dimensional coordinate of the $t$ -th fixation during unconstrained scene exploration. In a reference-centered dynamic scanning strategy, the gaze plan at time $t$ is determined not solely by the raw scene content but with respect to $x_0$ (or, more generally, a set of prior fixation points), with the current and future fixations analyzed in coordinates relative to this origin.

Key variables and notations:

$x_0^x$ , $x_t^x$ : Horizontal (x-axis) component of the fixation location.
$\mu(t) = \frac{1}{N} \sum_{i=1}^N x_i^x(t)$ : Mean horizontal gaze position at time $t$ (averaged over $N$ trials).
$\Delta(t) = \mu(t) - x_0^x$ : Displacement of average gaze from the reference.
$O = \max_t | \Delta(t) |$ : Overshoot magnitude, representing the maximum deviation from reference over time.

This specification enables the computation of scanpath dynamics in a manner explicitly anchored to the initial fixation.

2. Metrics for Quantifying Scanpath Dynamics

The principal metrics for such strategies focus on the temporal evolution of the scanpath relative to the starting position:

Metric	Definition	Significance
Mean position $\mu(t)$	$\frac{1}{N} \sum_{i=1}^N x_i^x(t)$	Population-averaged gaze location
Transient $\Delta(t)$	$\mu(t) - x_0^x$	Deviation from starting position
Overshoot $O$	$\max_t \|\Delta(t)\|$	Maximum excursion relative to reference

In empirical analyses, $\mu(t)$ is typically smoothed over a millisecond grid using a Gaussian kernel ( $\sigma \approx 100$ ms) to capture the continuous time course of gaze shifts. The evolution of $\Delta(t)$ signals how strongly the initial fixation continues to bias behavior, and whether systematic excursions (overshoots) in gaze occur.

3. Experimental Protocols and Scene Structure

The reference-centered dynamic scanning strategy was interrogated through a controlled protocol:

Stimuli: 64 colored photographs, including both “object-based” and “pattern” images. Object-based scenes were further classified via computed saliency maps (Graph-Based Visual Saliency, Judd model, and empirical density) as balanced, left-salient, or right-salient.
Participants: 28 observers (20 F, 8 M), normal/corrected vision.
Procedure: Each trial began with fixation cross placement at 5.6° from the left or right image border (i.e., controlled $x_0$ ), held for 1 s under gaze monitoring. This was followed by free scene viewing ( $\sim$ 5 s), then a memory test.
Data collection: Approximately 47,330 fixations were recorded, with saccades detected by a velocity-threshold algorithm (minimum amplitude 0.58°).

This protocol enables direct measurement of how initial spatial reference determines subsequent scanpath dynamics for different scene types and saliency structures.

4. Computational Models: Statistical and Dynamical Approaches

To elucidate the mechanisms underlying reference-centered scanpaths and account for the observed transients/overshoots, multiple models were implemented:

4.1 Statistical Control Models

Density-map model: Each fixation $x_t$ $x_{t}$ is sampled independently from the empirical fixation probability distribution $p_\text{emp}(x)$ $p_{emp} (x)$ (estimated by kernel smoothing of the human data).
- $x_t \sim p_\text{emp}(x),\ \forall t$
Gaussian-weighted density map: Introduces an attentional window (width $\sigma_a = 4.88^\circ$ $σ_{a} = 4.8 8^{\circ}$ ) centered on previous fixation $x_{t-1}$ $x_{t - 1}$ :
- $p(x|x_{t-1}) \propto \exp\left( -\frac{\|x - x_{t-1}\|^2}{2\sigma_a^2} \right) \cdot p_\text{emp}(x)$ .
Saccadic-momentum model: First two saccades are replayed from the human sequence; subsequent saccades sampled from empirical joint distribution of amplitudes/directions $P(r, \theta)$ .

These approaches lack dynamic memory or direct suppression of previously-fixated locations.

4.2 Dynamical Model: SceneWalk with Inhibitory Tagging

Activation maps: Maintains two grids: a fixation/inhibition map $f_{ij}(t)$ $f_{ij} (t)$ and an attention map $a_{ij}(t)$ $a_{ij} (t)$ over pixel/region indices $(i, j)$ $(i, j)$ .
- Inhibition update: $f_{ij}(t + 1) = (1 - \lambda) f_{ij}(t) + \alpha w(i - i_t, j - j_t)$ , with $w$ a radial Gaussian ( $\sigma_I = 4.88^\circ$ ); $\alpha$ is per-fixation activation, $\lambda$ is decay.
- Attention map: Driven by static scene saliency $S_{ij}$ , decays over time.
Potential function: $u_{ij}(t) = -\frac{a_{ij}(t)}{\sum_{kl}a_{kl}(t)} + \frac{[f_{ij}(t)]^\gamma}{\sum_{kl} [f_{kl}(t)]^\gamma}$
- Nonlinear exponent $\gamma$ (tuned: $0.3 \to 0.2$ ).
Target selection: Among $u_{ij}(t) < 0$ , select with probability $\pi_{ij}(t) \propto \max\left\{ u_{ij}(t)/\sum u_{kl}(t), \eta \right\}$ , with small $\eta$ for noise.

Only the dynamical, inhibitory-tagging framework can capture core features observed in empirical scanpaths.

5. Empirical Results and Model Comparisons

Analysis of human scanpath data under manipulated initial fixation reveals:

Persistent transients: The mean gaze position $\mu(t)$ remains biased toward the starting side for 3–5 seconds, substantially beyond single-saccade effects. This demonstrates sustained influence of the reference point.
Systematic overshoot: In nearly all scene types, $\Delta(t)$ crosses zero at $t \approx 1.5$ –$2$ s and peaks on the side opposite the entry fixation before stabilizing; $O$ (overshoot magnitude) is typically 20–30% of image width.
Model performance:
- Density-maps and Gaussian-weighted models fail to exhibit overshoot, instead relaxing slowly or asymptoting toward neutral.
- Saccadic-momentum model can reproduce initial momentum-driven transients, but does not generate overshoot phenomena.
- SceneWalk (with inhibitory tagging and $\gamma=0.2$ ) matches both time course and overshoot quantitatively across all (start $\times$ scene) combinations.

These outcomes underscore that a dynamical reference-centered strategy with active inhibition of prior fixations, as implemented in SceneWalk, is required to explain human scanpath properties.

6. Mechanistic Insights and Applications

The overshoot phenomenon—where mean gaze position migrates beyond the image center and temporarily favors the side opposite the starting fixation—cannot be explained by static density or saccadic-momentum alone. It necessitates an active inhibitory process anchored on the reference location ( $x_0$ ) and subsequent fixations. Inhibitory tagging acts as a dynamic memory, biasing attentional allocation and saccade planning away from already-inspected scene regions, and systematically promoting exploration of novel areas.

A plausible implication is that reference-centered inhibitory mechanisms optimize scanpath efficiency by reducing redundant revisits and fostering thorough exploration. In applied domains beyond biological vision, any system (such as a robotic visual sensor or adaptive human-machine interface) that requires dynamic, efficient scene sampling can benefit from maintaining an inhibition map centered on current or recent fixation points. This approach supports principled allocation of sensing or attentional resources to unexplored or unpredictable regions, implementing an efficient, reference-centered dynamic scanning strategy in artificial domains.

Markdown Report Issue Upgrade to Chat

References (1)

Influence of initial fixation position in scene viewing (2016)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reference-Centered Dynamic Scanning Strategy.