BubbleView Interface for Visual Attention
- BubbleView is a crowdsourced moving-window interface that simulates peripheral vision by selectively revealing unblurred image regions through mouse clicks.
- It employs Gaussian blur and discrete click interactions to generate attention heatmaps, capturing key visual importance metrics across diverse image types.
- Tunable parameters such as blur strength, bubble radius, and viewing time enable cost-effective, scalable evaluation of visual attention with robust performance metrics.
BubbleView is a mouse-contingent, moving-window interface designed to crowdsource visual attention and image importance maps without specialized eye-tracking hardware. Users are presented with globally blurred images simulating peripheral vision and reveal informative regions at full resolution by clicking, producing discrete attention measurements. BubbleView has been validated across multiple image domains—including information visualizations, natural images, web pages, and graphic designs—and widely adopted as a scalable, cost-efficient proxy for eye movements in human attention studies (Kim et al., 2017, Newman et al., 2020).
1. Interface Mechanics and User Interaction
BubbleView presents each stimulus image initially blurred by a Gaussian kernel with standard deviation σ (typically 30–50 px, matching 1–2° visual angle). Text and fine details are rendered unreadable in the periphery, approximating the loss of acuity outside the fovea. Participants interact by clicking to reveal “bubbles”—circular regions of radius r (30–50 px) at the location of interest—displaying the original, unblurred image pixels within the bubble. Only one bubble is visible at any time; each new click re-blurs the previous bubble, enforcing spatially and temporally discrete sampling.
Task modalities include:
- Free-viewing: Participants have a fixed time T (e.g., 10–30 s) to explore and click at will; produces unconstrained attention maps.
- Description: Participants must simultaneously click to reveal regions and type a minimum-length (e.g., 150 character) description of viewed content, which promotes sustained engagement.
BubbleView can run in discrete-click or continuous-reveal mode (the latter less common), with discrete clicks preferred for mapping visual attention (Kim et al., 2017, Newman et al., 2020).
2. Technical Architecture and Implementation
BubbleView is implemented in standard web technologies:
- HTML5 Canvas (or optionally CSS
filter: blur()) is used to render the blurred overlay and draw high-resolution bubbles via canvas context clipping masks. - JavaScript manages event listeners, interface state, and aggregation of click/timestamp logs. A typical handler processes a click by drawing the blurred image, then clipping and restoring the bubble region.
- Data Logging: For each stimulus presentation, BubbleView records the array of clicks , bubble parameters, and session metadata (worker ID, browser, screen dimensions, interaction time, and optional description text).
A simplified outline of the click-to-reveal logic: 1 Blur is typically implemented as a separable Gaussian filter, and bubble radius is set based on the experimental protocol (Newman et al., 2020).
3. Parameter Tuning and Experimental Protocols
Key tunable parameters include:
- Blur strength (): 30–50 px, chosen so peripheral details are illegible. Lower σ reduces occlusion (fewer required clicks) but decreases foveation.
- Bubble radius (r): 30–50 px, balancing spatial accuracy (smaller r) versus participant effort (fewer clicks for larger r).
- Viewing time (T): e.g., 10 s for free-viewing natural scenes, longer for dense visualizations or description.
- Participant count: Stability in heatmaps is achieved with 10–15 unique users per image, recovering 97–98% of the limiting performance seen with more participants.
- Mobile scaling: Bubble sizes and blur parameters must be adjusted for display DPI and limited screen space.
Empirical guidelines recommend σ ≈ 40 px, r ≈ 40 px, and T ≈ 10 s for desktop free-viewing; for description, σ and r in [30, 50] px and minimum text length are enforced (Newman et al., 2020, Kim et al., 2017).
4. Data Processing and Attention Map Generation
Collected click logs are filtered for participants with anomalously few (<2–10/image) or excessive clicks (IQR outlier removal). Valid participant clicks are aggregated across the image to generate a continuous attention heatmap , defined as: with chosen to approximate positional uncertainty, typically set to match the bubble radius. The resulting is normalized for comparability and use in standard saliency evaluation frameworks (Newman et al., 2020).
5. Evaluation Metrics and Quantitative Performance
BubbleView’s output is evaluated using metrics standard in eye-tracking and saliency modeling:
- Pearson Correlation Coefficient (CC):
- Normalized Scanpath Saliency (NSS):
where is the z-scored heatmap.
- Area Under the ROC Curve (AUC) and Kullback–Leibler Divergence (KL) are also reported.
On CAT2000 natural images, BubbleView achieves (72% of inter-observer consistency) and 0 (65% of human consistency) versus ground-truth fixations, outperforming ZoomMaps and ImportAnnots, and second only to CodeCharts (CC=0.76, NSS=2.00) (Newman et al., 2020). For information visualizations (MASSVIS, r=24–40 px), CC ≈ 0.86 and NSS ≈ 1.2–1.3 (89% of IOC) are observed with only 10 clickers per image (Kim et al., 2017).
6. Comparative Analysis, Use Cases, and Limitations
Comparative Performance
| Interface | CC | NSS | Consistency (rel. human) | Minimum Participants | Cost per Image (USD) |
|---|---|---|---|---|---|
| BubbleView | 0.62 | 1.58 | 65–72% | 10–15 | 0.45 |
| CodeCharts | 0.76 | 2.00 | 82–90% | ~50 | (higher) |
| ZoomMaps | 0.59 | 1.37 | 61–64% | 10–15 | (medium) |
| ImportAnnots | 0.51 | 1.22 | 50–53% | 10–15 | (medium) |
BubbleView is most effective for:
- Rapid, low-cost mapping of approximate attention on desktop.
- Visualization and web design experiments requiring importance maps or region ranking.
- Tasks where lightweight, scalable deployment matters more than fine-grained temporal precision.
Limitations
- Artificially slows exploration (~2–3× slower versus passive gaze).
- Small bubbles may induce participant fatigue or encourage inattentive clicking.
- BubbleView often under-samples less salient pictorial regions if users selectively click only on recognizable or text-rich areas.
- Provides less temporal granularity and naturalistic sequence data compared to eye trackings (Newman et al., 2020, Kim et al., 2017).
Recommended best practices include pre-task calibration, minimum click and description requirements, and pilot studies to optimize blur/bubble parameters for the specific image set.
7. Extensions, Applications, and Future Directions
BubbleView generalizes across static image types and supports both saliency approximation and importance ranking without specialized hardware or calibration overhead. It enables large-scale computational studies, human-in-the-loop annotation pipelines, and has been integrated into toolboxes such as TurkEyes (Newman et al., 2020).
Possible future directions include:
- Incorporation into collaborative or gamified crowdsourcing environments for simultaneous quality and attention data collection.
- Leveraging BubbleView-derived ground truth to train deep saliency models for underrepresented domains (e.g., medical images, user interfaces).
- Hybrid question-answering and visual search tasks in which bubble clicks guide both exploration and responses.
The source code and deployment framework are publicly available via massvis.mit.edu/bubbleview, facilitating easy customization and large-scale experiments (Kim et al., 2017, Newman et al., 2020).