Reference-Centered Dynamic Scanning
- Reference-Centered Dynamic Scanning Strategy is defined as a scanning framework where gaze is allocated relative to an initial fixation point, integrating scene saliency with dynamic spatial references.
- It employs inhibitory tagging to actively suppress revisits to previously fixated areas, resulting in measurable transients and overshoots in gaze trajectories.
- Computational models like SceneWalk validate that incorporating a reference-centered approach enhances the prediction of human scanpaths compared to static density or momentum models.
A reference-centered dynamic scanning strategy is a paradigm for sequential allocation of visual (or sensor) fixations in which each planned gaze position is selected not only according to the underlying scene structure (such as saliency maps) but also relative to an explicit, dynamically-updated spatial reference—typically the initial fixation position. Here, the scanpath is influenced by the spatiotemporal history of gaze, such that the trajectory evolves in coordinates centered on a reference point, and oculomotor planning incorporates mechanisms that discourage refixation of already-visited areas. This conceptual framework is formalized and empirically scrutinized in Rothkegel et al. (2016) (Rothkegel et al., 2016), who quantify and model how experimentally manipulated initial fixations govern the subsequent dynamics of human eye movements in natural scene viewing. Key in this account is the role of inhibitory tagging, whereby previously fixated locations acquire a time-varying suppression in attentional priority, producing characteristic temporal transients and systematic “overshoots” in fixation distributions.
1. Formal Definition and Notation
Let denote the initial fixation position, which is experimentally controlled, and let be the two-dimensional coordinate of the -th fixation during unconstrained scene exploration. In a reference-centered dynamic scanning strategy, the gaze plan at time is determined not solely by the raw scene content but with respect to (or, more generally, a set of prior fixation points), with the current and future fixations analyzed in coordinates relative to this origin.
Key variables and notations:
- , : Horizontal (x-axis) component of the fixation location.
- : Mean horizontal gaze position at time (averaged over trials).
- : Displacement of average gaze from the reference.
- : Overshoot magnitude, representing the maximum deviation from reference over time.
This specification enables the computation of scanpath dynamics in a manner explicitly anchored to the initial fixation.
2. Metrics for Quantifying Scanpath Dynamics
The principal metrics for such strategies focus on the temporal evolution of the scanpath relative to the starting position:
| Metric | Definition | Significance |
|---|---|---|
| Mean position | Population-averaged gaze location | |
| Transient | Deviation from starting position | |
| Overshoot | Maximum excursion relative to reference |
In empirical analyses, is typically smoothed over a millisecond grid using a Gaussian kernel ( ms) to capture the continuous time course of gaze shifts. The evolution of signals how strongly the initial fixation continues to bias behavior, and whether systematic excursions (overshoots) in gaze occur.
3. Experimental Protocols and Scene Structure
The reference-centered dynamic scanning strategy was interrogated through a controlled protocol:
- Stimuli: 64 colored photographs, including both “object-based” and “pattern” images. Object-based scenes were further classified via computed saliency maps (Graph-Based Visual Saliency, Judd model, and empirical density) as balanced, left-salient, or right-salient.
- Participants: 28 observers (20 F, 8 M), normal/corrected vision.
- Procedure: Each trial began with fixation cross placement at 5.6° from the left or right image border (i.e., controlled ), held for 1 s under gaze monitoring. This was followed by free scene viewing (5 s), then a memory test.
- Data collection: Approximately 47,330 fixations were recorded, with saccades detected by a velocity-threshold algorithm (minimum amplitude 0.58°).
This protocol enables direct measurement of how initial spatial reference determines subsequent scanpath dynamics for different scene types and saliency structures.
4. Computational Models: Statistical and Dynamical Approaches
To elucidate the mechanisms underlying reference-centered scanpaths and account for the observed transients/overshoots, multiple models were implemented:
4.1 Statistical Control Models
- Density-map model: Each fixation is sampled independently from the empirical fixation probability distribution (estimated by kernel smoothing of the human data).
- Gaussian-weighted density map: Introduces an attentional window (width ) centered on previous fixation :
- .
- Saccadic-momentum model: First two saccades are replayed from the human sequence; subsequent saccades sampled from empirical joint distribution of amplitudes/directions .
These approaches lack dynamic memory or direct suppression of previously-fixated locations.
4.2 Dynamical Model: SceneWalk with Inhibitory Tagging
- Activation maps: Maintains two grids: a fixation/inhibition map and an attention map over pixel/region indices .
- Inhibition update: , with a radial Gaussian (); is per-fixation activation, is decay.
- Attention map: Driven by static scene saliency , decays over time.
- Potential function:
- Nonlinear exponent (tuned: ).
- Target selection: Among , select with probability , with small for noise.
Only the dynamical, inhibitory-tagging framework can capture core features observed in empirical scanpaths.
5. Empirical Results and Model Comparisons
Analysis of human scanpath data under manipulated initial fixation reveals:
- Persistent transients: The mean gaze position remains biased toward the starting side for 3–5 seconds, substantially beyond single-saccade effects. This demonstrates sustained influence of the reference point.
- Systematic overshoot: In nearly all scene types, crosses zero at –$2$ s and peaks on the side opposite the entry fixation before stabilizing; (overshoot magnitude) is typically 20–30% of image width.
- Model performance:
- Density-maps and Gaussian-weighted models fail to exhibit overshoot, instead relaxing slowly or asymptoting toward neutral.
- Saccadic-momentum model can reproduce initial momentum-driven transients, but does not generate overshoot phenomena.
- SceneWalk (with inhibitory tagging and ) matches both time course and overshoot quantitatively across all (start scene) combinations.
These outcomes underscore that a dynamical reference-centered strategy with active inhibition of prior fixations, as implemented in SceneWalk, is required to explain human scanpath properties.
6. Mechanistic Insights and Applications
The overshoot phenomenon—where mean gaze position migrates beyond the image center and temporarily favors the side opposite the starting fixation—cannot be explained by static density or saccadic-momentum alone. It necessitates an active inhibitory process anchored on the reference location () and subsequent fixations. Inhibitory tagging acts as a dynamic memory, biasing attentional allocation and saccade planning away from already-inspected scene regions, and systematically promoting exploration of novel areas.
A plausible implication is that reference-centered inhibitory mechanisms optimize scanpath efficiency by reducing redundant revisits and fostering thorough exploration. In applied domains beyond biological vision, any system (such as a robotic visual sensor or adaptive human-machine interface) that requires dynamic, efficient scene sampling can benefit from maintaining an inhibition map centered on current or recent fixation points. This approach supports principled allocation of sensing or attentional resources to unexplored or unpredictable regions, implementing an efficient, reference-centered dynamic scanning strategy in artificial domains.