Mouse vs AI: Robust Foraging Benchmark
- The paper introduces a benchmark comparing RL agents and mice in a unified 3D foraging task, focusing on visual robustness and neural representation alignment.
- It employs controlled visual perturbations and risk-aware interventions to evaluate generalization and adaptive behavior across biological and artificial foragers.
- Findings reveal that while mice maintain high performance under perturbation, AI agents require bioinspired mechanisms to bridge gaps in risk assessment and robust performance.
Mouse vs. AI: Robust Foraging Competition is a benchmark and a research program targeting the comparative paper of visual robustness, risk assessment, adaptive behavior, and neural representation in natural and artificial foragers. Building on the ecological, behavioral, and neurocomputational principles observed in mice, this framework evaluates reinforcement learning (RL) agents alongside biological agents in a unified set of visually guided foraging tasks, combining quantifiable robustness measures and large-scale neurophysiological alignment.
1. Benchmark Overview and Biological Motivation
The Mouse vs. AI: Robust Foraging Competition introduces a bioinspired benchmark for training and evaluating RL agents in a visually guided, target-driven foraging task implemented in a complex Unity-based 3D environment (Schneider et al., 17 Sep 2025). Two core dimensions delineate this challenge:
- Visual Robustness (Generalization): Agents are assessed based on their ability to maintain high foraging performance across a range of held-out, ecologically realistic visual perturbations. This addresses a critical gap, as biological systems—such as mice—demonstrate exceptional generalization and resilience to perceptual disruptions, whereas conventional RL agents often degrade catastrophically under domain shifts.
- Neural Alignment: Beyond observable behavior, agents are evaluated for the alignment between their internal representations and real mouse visual cortex activity, based on mesoscale two-photon calcium imaging of 19,000+ neurons during task performance.
The benchmark’s biological grounding is established by parallelizing the exact foraging task—a visually cued navigation with time-constrained target pursuit—in both virtual agents and head-fixed mice, the latter navigating the same Unity-rendered environment via VR and omnidirectional treadmill (Schneider et al., 17 Sep 2025).
2. Experimental Structure and Evaluation Protocol
Task Structure:
- At each timestep, the agent (mouse or artificial) receives an 86×155 grayscale egocentric image input reflecting its field of view.
- The navigational goal is a visual target, randomly offset up to ±30°, always visible on spawn.
- Agents have 5 seconds per trial to intercept the target, rewarded as .
- Actions are continuous (translation forward/backward, lateral, rotation) and must be chosen based on visual input alone.
Visual Robustness Regime:
- Agents train under "normal" conditions and a fog perturbation.
- Evaluation involves three additional, previously unseen perturbations, including dynamic lighting and particle effects, to measure out-of-distribution generalization.
Biological Data Integration:
- Mice perform the same task under VR, generating behavioral and neural data.
- Agents’ visual encoders are probed post hoc by replaying the mice’s visual inputs and regressing the resulting agent representations against simultaneously recorded neuronal responses to compute alignment.
Tracks and Metrics:
Track | Metric | Calculation |
---|---|---|
Visual Robustness | Average, Minimum Success Rate (ASR, MSR) | Score = ASR MSR across all 5 visual regimes |
Neural Alignment | Max mean Pearson correlation across agent layers | Score = after PCA and linear regression |
for recorded neurons, with predicted via linear regression from agent activations (Schneider et al., 17 Sep 2025).
3. Comparative Performance: Mice versus RL Agents
Biological Mice:
- Mice maintain consistently high success rates in the foraging task, even under severe visual perturbations, evidencing strong generalization and task resilience.
- Mouse cortical representations evolve to support robust, object-centric navigation despite domain shift, as indicated by stable performance and alignment of visual cortex activity.
Artificial RL Agents:
- Baseline agents (e.g. Proximal Policy Optimization (PPO) with CNN encoders, ResNet-18, neuro-inspired CNNs) generally perform well in normal or "fog" conditions but exhibit pronounced performance drops under unseen perturbations.
- Agents with higher average success rates (ASR) often show sharp minima (MSR) for specific perturbations, revealing signature failure modes absent in mouse behavior.
- Neural alignment evaluations indicate that, even without neural supervision, some agent architectures exhibit emergent correspondence with mouse visual population codes. The strength and layer specificity of this alignment is highly architecture- and training-regime-dependent, and does not guarantee behavioral robustness.
4. Risk-Assessment, Foraging Behavior, and Robustness Mechanisms
Biological Insights:
- Mice leverage evolved risk assessment: slow, cautious approach, “waiting” at decision points, and robust strategies under partial observability (Han et al., 18 May 2025).
- In predator-avoidance and foraging mazes, mice show pronounced thigmotaxis and high visitation near cover, optimizing survival over efficiency.
Artificial Agent Limitations and Solutions:
- Naive RL agents exhibit path-optimal but risk-prone policies, readily taking direct routes under uncertainty and lacking hesitation or wall-following (Han et al., 18 May 2025).
- Interventions such as the Trauma-Inspired Safety Buffer (TISB) and variance-penalized TD learning have been proposed to instill risk-aware, naturalistic exploration. TISB amplifies negative ("near death") experiences in experience replay, while variance-penalized TD introduces uncertainty costs tied to action-value dispersion across state–action ensembles, encouraging avoidance of high-risk, high-uncertainty situations.
Mechanism | Description | Biological Parallel |
---|---|---|
Trauma-Inspired Safety Buffer | Overweights and amplifies negative experiences | Single-event trauma memory in mice |
Variance-Penalized TD Learning | Penalizes action-value uncertainty | Natural hesitation and risk aversion |
These modifications cause RL agents to exhibit pausing, hedging, wall-following, and exploratory behaviors more closely resembling those observed in real mice, yielding increased survival at the expense of some efficiency (Han et al., 18 May 2025).
5. Neural Alignment and Emergent Representation
A novel contribution of the Mouse vs. AI benchmark is the systematic evaluation of internal model dynamics for similarity to biological vision. This is achieved by linear readout from agent activations (after PCA) to real mouse V1/HVA activity recorded during foraging. Key findings include:
- Agents trained on task success alone can yield representations partially predictive of mouse neural activity, with the strength of alignment sensitive to network depth, architecture, and task regime (Schneider et al., 17 Sep 2025).
- Emergent neural alignment may be linked to agent robustness: models with higher alignment sometimes show improved generalization, though the correlation is not universal.
- This approach allows task-driven learning to be scrutinized for both behavioral competence and biological plausibility, creating a unified comparative framework across artificial and biological agents.
6. Broader Implications for Robust AI and Computational Neuroscience
The benchmark’s dual focus supports advances in both robust AI and systems neuroscience:
- For RL/AI: Visual robustness and risk-awareness, as exhibited by mice, remain challenging for modern embodied AI. The benchmark exposes gaps in generalization and naturalistic uncertainty handling, paving the way for new architectures, inductive biases, and training protocols inspired by biological systems (Schneider et al., 17 Sep 2025).
- For Neuroscience: The convergence of behavioral and representational measures enables reciprocal insight: artificial agents inform theories of efficient representation, core computations (evidence accumulation, value estimation), and planning strategies in animal navigation, while biological data validates the ecological relevance of computational models.
- For Cross-Disciplinary Science: The integration of behaviorally grounded RL, computer vision, and large-scale neural recording sets a precedent for future neuroethological benchmarks, promoting mutually informative progress in animal behavior modeling and AI safety/robustness research.
7. Open Questions and Future Directions
Several open research avenues are highlighted by the Mouse vs. AI framework:
- What architectural or algorithmic innovations are necessary for RL agents to match, or exceed, biological robustness to visual and task perturbations?
- How do internal representation dynamics differ under explicit neural alignment objectives versus pure behavioral optimization?
- Can biologically derived data (e.g., neural recordings, animal trajectories) be exploited for offline RL or improved transfer/generalization?
- To what extent do interventions like TISB or variance-based penalties scale to more complex environments or richer sensory domains?
- How closely must agents’ internal representations match those of biological systems to realize animal-level generalization, and is deep alignment necessary for robust behavior?
Summary Table: Core Aspects of the Mouse vs. AI Robust Foraging Competition
Aspect | Mice | RL Agents (Baseline) | RL Agents (w/ Biologically Inspired Mechanisms) |
---|---|---|---|
Visual Robustness | High, across perturbations | Low to moderate | Improved, but not at biological level |
Risk Assessment | Cautious, hesitation, detours | Direct, risk-prone | Increased hesitation, safer exploration |
Success Rate (ASR/MSR) | Consistently high | Decreases under shift | Gap reduced, but biological superiority |
Neural Alignment (V1/HVA) | Reference standard | Variable, arch-dependent | Enhanced by neuro-inspired architectures |
Adaptive Behavior | Robust under uncertainty | Fails under OOD shift | More robust and animal-like |
References
- Mouse vs. AI: Robust Foraging Competition benchmark (Schneider et al., 17 Sep 2025)
- Of Mice and Machines: Comparison of Learning (Han et al., 18 May 2025)
- Adaptive Patch Foraging in Deep RL Agents (Wispinski et al., 2022)
- Achieving Mouse-Level Strategic Evasion Performance (Espinosa et al., 2022)
- Simulating How Animals Learn: Bayesian MCMC (Thompson et al., 2022)
- Evolution of Sustained Foraging in 3D (Chaumont et al., 2011)
The Mouse vs. AI: Robust Foraging Competition provides a comprehensive, neuroethologically grounded testbed for evaluating and advancing the generalization, robustness, and biological fidelity of visually guided RL agents, with parallel implications for theories of adaptive behavior and neural representation in natural systems.