Dynamic Optical Test for Bot Identification (DOT-BI)
- The paper introduces DOT-BI, a dynamic optical test that integrates moving noise textures with masked digits to differentiate human vision from automated systems.
- It employs differential motion scaling and translation to create perceptual disparities between the human visual system and algorithmic processing.
- Empirical validations reveal near-perfect human success rates and strong resistance against current bot decryption techniques.
The Dynamic Optical Test for Bot Identification (DOT-BI) is a lightweight perceptual-motion CAPTCHA designed to exploit the human visual system’s unique ability to segment moving shapes against identically textured backgrounds, thereby enabling the differentiation between human respondents and automated systems in online surveys and processes. DOT-BI achieves this through the dynamic presentation of a masked number concealed in a binary noise field, relying on differences in motion and scale between the number and its background to render the target perceptible only to human observers and not to conventional algorithmic or multimodal vision models (Bleeker et al., 3 Dec 2025).
1. Mechanism and Generation Pipeline
DOT-BI is constructed by generating a binary random noise texture , typically sampled from on an image grid of size . The target number is rendered as a binary mask , which defines the spatial location of the number. For each frame in an animation sequence of length , pixel values are given by:
where and are identical random textures sharing the same initial seed but manipulated through independent spatial translations and (optionally) scale transformations over time. The background is shifted according to , and the target element follows , with scale parameters and modulating the frame-wise scaling (e.g., for background, for element). The final output consists of frames (e.g., ), encoded as a looping GIF or a short video suitable for web deployment.
Humans rapidly perceive the embedded number as a salient figure via global motion contrast; in contrast, framewise inspection by algorithmic agents reveals only stochastic visual noise, with the target information temporally encoded across the sequence. Effective decoding by a bot would require sophisticated multi-frame pixel alignment and differencing across unknown transformation parameters, which is computationally expensive and unreliable for generic, unsupervised model architectures.
2. Mathematical and Algorithmic Principles
The differential detectability between human and machine vision in DOT-BI is grounded in spatiotemporal encoding of the signal. To extract the hidden number algorithmically, a method must perform cross-frame alignment by optimizing:
and maximize inter-frame correlations
across latent translation and scaling parameters. Signal-to-noise ratio (SNR) for human perception is characterized by:
where spans pixels within the mask and refers to the background. Motion/shifting parameters () are established such that perceptual threshold (≈6 dB), while practical bot-based estimation remains infeasible. Perceptual thresholds are achievable with framewise shifts of 1–2 px at display resolutions of 50–70 dpi and frame rates of 24–30 fps.
3. Empirical Validation
DOT-BI’s effectiveness is demonstrated via both adversarial AI evaluation and human user studies. In tests against state-of-the-art, video-capable, multimodal models (Gemini 2.5 Pro, GPT-5-Thinking), neither were able to correctly extract the concealed codes in 10/10 trials each—regardless of whether the motion-based mechanism was disclosed in prompts. Gemini 2.5 Pro hallucinated responses, while GPT-5-Thinking returned timeouts, indicating a reliance on frame-independent processing pipelines with inadequate temporal alignment capabilities for this task.
In human subject testing, an online survey (n=182 managers, Prolific) yielded a 99.5% solution rate (181/182) with an average total completion time of 10.7 seconds (SD ≈3.2 s). In a supervised lab study (n=39; DOT-BI group = 27, control = 12), the DOT-BI group achieved 100% success. Mean instruction dwell time was 21.3 seconds. Perceived ease-of-use differed non-significantly (Likert 1–7: control = 5.8, DOT-BI = 6.3; U=82.5, Z=2.40, p=.016, not significant at ). No significant effects on survey completion time or performance on parallel attention checks were observed (Bleeker et al., 3 Dec 2025).
4. Implementation and Deployment Guidelines
A high-level pseudocode specification defines the core DOT-BI generation:
1 2 3 4 5 6 7 8 9 |
function generateDOTBI(number, seed, params):
texture ← genNoise(W, H, seed)
mask ← renderTextMask(number, font, position)
for t in 0…T−1:
Δb ← params.v_b · t ; s_b ← 1 + params.α · t
Δe ← params.v_e · t ; s_e ← 1 (or params.β · t)
frame ← composite(texture, mask, Δb, s_b, Δe, s_e)
append frame to sequence
return encodeGIF(sequence, framerate=params.fps) |
Key implementation parameters include:
- seed: ensures a unique noise instance per test,
- , : 2D translation velocities for background and element,
- , : per-frame scale increments,
- , fps: sequence length and playback rate,
- font, mask antialiasing: legibility and presentation.
Integration into web-surveys consists of pre-rendering GIF or WebM files (100+ variants are available), embedding with standard <img> or <video> tags, and pairing with numeric-entry fields. Variant delivery can be randomized, and server-side validation is performed using the known sequence seed. It is recommended to rotate variants regularly and to tie instance seeds to session credentials (Bleeker et al., 3 Dec 2025).
5. Limitations and Adversarial Considerations
DOT-BI is robust against current frame-wise or translationally invariant vision algorithms and multimodal AI models; however, future bots could potentially leverage optical flow networks or exhaustive cross-correlation techniques to reconstruct transformation parameters . Additional considerations include:
- Photosensitive epilepsy risk: Motion warnings should be displayed for sensitive populations.
- Accessibility: Users with impaired vision or slower cognitive processing may require extended exposure or alternative CAPTCHA mechanisms.
Possible adversarial countermeasures include introducing nonlinear element trajectories (e.g., rotation or oscillation), varying mask opacity or color channels (chromatic noise), and dynamically generating challenge tokens to prevent replay attacks.
6. Practical Recommendations for Secure Deployment
Deployment should implement:
- Frequent variant rotation and per-session seeding,
- Logging failed attempts for anomaly and replay detection,
- Combining with behavioral timing analysis (e.g., reading patterns) as an additional authentication factor,
- Guaranteed accessible alternatives for users unable to process rapidly moving visual stimuli.
Reference implementation and pre-rendered assets are available at https://github.com/MalteBleeker/DOT-BI, facilitating rapid deployment in research and commercial survey pipelines (Bleeker et al., 3 Dec 2025).