Papers
Topics
Authors
Recent
Search
2000 character limit reached

Curiosity-driven Exploration in RL

Updated 5 June 2026
  • Curiosity-driven exploration is a framework where agents use intrinsic rewards like prediction error and information gain to autonomously seek novel experiences.
  • It enables overcoming sparse rewards and facilitates self-organized curriculum learning for effective skill and world model acquisition.
  • Empirical studies in robotics, gaming, and scientific discovery show that these methods enhance exploration efficiency and adaptive learning.

Curiosity-driven exploration is a class of algorithms and theoretical frameworks in reinforcement learning (RL), robotics, and autonomous behavior that leverage intrinsic motivation to encourage agents to systematically seek out novel or informative experiences, often independent of any extrinsic reward signal. By incentivizing the agent to reduce uncertainty, maximize learning progress, or discover rare state transitions, curiosity-driven mechanisms have proven essential in sparse or deceptive reward environments for enabling efficient skill acquisition, robust world modeling, and emergent curriculum learning.

1. Foundational Principles and Motivation

Curiosity-driven exploration is framed as an internal, reward-like signal that supplements or replaces extrinsic supervision in RL and developmental robotics. Its goal is to guide agents toward unknown, surprising, or informative regions of the state–action space, supporting several key functions:

  • Overcoming sparse and deceptive rewards: In many practical settings (robot exploration, open-world games, scientific discovery), meaningful external rewards are infrequent or misleading. Curiosity, by rewarding model prediction error, learning progress, or information gain, can bootstrap exploration where standard entropy-based or random action noise fails (Oudeyer, 2018, Pathak et al., 2017).
  • Systematic skill and world model acquisition: Curiosity mechanisms direct behavior toward states with high epistemic value (uncertainty reduction), facilitating continual learning and model refinement (Oudeyer, 2018, Tinguy et al., 2023).
  • Self-organized curriculum: By focusing on regions of high learning progress, agents can autonomously sequence their own developmental trajectory, first mastering easy-to-learn skills and moving toward more difficult ones as competence grows (Oudeyer, 2018, Laversanne-Finot et al., 2018).
  • Discovery of controllable features and affordances: Curiosity-driven sampling of goals or questions enables the unsupervised disentanglement of factors of variation relevant for control, as well as identification of rare but solvable subtasks (Laversanne-Finot et al., 2018, Kaur et al., 2021).

2. Core Algorithmic Formulations

Several major algorithmic families operationalize curiosity, each with distinct mathematical and computational underpinnings:

Prediction-Error–based Intrinsic Rewards

The majority of modern architectures (e.g., Intrinsic Curiosity Module [ICM]) implement curiosity as the discrepancy between a learned forward model's prediction and the agent’s observed transition in a feature space:

rtint=ηϕ^(st+1)ϕ(st+1)2r_t^{\mathrm{int}} = \eta \|\hat \phi(s_{t+1}) - \phi(s_{t+1})\|^2

where ϕ()\phi(\cdot) is a learned feature encoder (often via a jointly-trained or self-supervised inverse model), and ϕ^\hat\phi is the prediction given (st,at)(s_t, a_t) (Pathak et al., 2017, Zhelo et al., 2018, Dooraki et al., 2023). This signal is integrated into the agent's policy optimization:

rt=rtext+rtintr_t = r_t^{\mathrm{ext}} + r_t^{\mathrm{int}}

Extensions include memory-augmented models to prevent catastrophic forgetting (Schillaci et al., 2020) and attention-weighted rational curiosity (Reizinger et al., 2019).

Learning Progress and Goal-based Sampling

Rather than the level of prediction error, several algorithms reward the absolute change in model error for specific goals (learning progress, LP):

LPi(t)=tanh(ei(t)ei(tΔt))\mathrm{LP}_i(t) = \tanh(|e_i(t) - e_i(t-\Delta t)|)

The goal-selection policy then favors goals with the highest LP, ensuring agents focus on tasks that are currently yielding the largest improvements and avoiding both saturated (fully learned) and random (irreducible error) regions (Schillaci et al., 2020, Oudeyer, 2018, Laversanne-Finot et al., 2018).

Bayesian Surprise and Information Gain

Normative formulations conceptualize curiosity as the expected information gain or Bayesian surprise from a transition:

rtint=DKL[qϕ(zt+1st,at,st+1)pϕ(zt+1st,at)]r_{t}^{\mathrm{int}} = D_{KL}[q_\phi(z_{t+1} | s_t, a_t, s_{t+1}) \parallel p_\phi(z_{t+1} | s_t, a_t)]

where qϕq_\phi and pϕp_\phi are the variational posterior and prior over a latent state in a learned world model (Mazzaglia et al., 2021, Tinguy et al., 2023). This approach avoids the "noisy TV" problem of prediction-error methods by only rewarding belief updates (i.e., learnable, not irreducible, surprises).

Multi-Modal and Structured Curiosity

Recent methods extend curiosity-driven exploration to multi-modal association (e.g. audio-visual pairing) (Dean et al., 2020), structured grounded question answering (Kaur et al., 2021), or meta-learned programmatic curiosity algorithms (Alet et al., 2020). In multi-agent systems, mixed-objective or context-calibrated curiosity modules incorporate both individual and collective novelty (Reyes et al., 2022, Pan et al., 25 Sep 2025).

3. Integration with RL and Practical Implementations

Curiosity-driven exploration is integrated with standard RL methods by combining the intrinsic and extrinsic rewards within the agent’s policy update loop. Representative architectures include:

Practical considerations include episodic memory buffers to balance plasticity and stability (Schillaci et al., 2020), normalization/clipping of curiosity bonuses for reward signal stability (Dai et al., 11 Sep 2025), and hybridization with entropy regularization or random action perturbation for baseline coverage (Zhelo et al., 2018, Reizinger et al., 2019).

4. Empirical Validation and Application Domains

Curiosity-driven exploration has demonstrated strong empirical advantages across a range of RL, robotics, and autonomous discovery domains:

Domain/Environment Methodological Innovation Empirical Outcome
Atari/Sonic/Gym tasks ICM, LBS, audio-visual SHE Faster coverage, superior sparse-reward scores (Pathak et al., 2017, Mazzaglia et al., 2021, Dean et al., 2020)
Robot navigation World model, LP, memory Dense spatial exploration, improved generalization (Schillaci et al., 2020, Tinguy et al., 2023, Zhelo et al., 2018)
Chemistry labs Goal-space curiosity (CA) 3-14× more diverse phenotypes discovered (Grizou et al., 2019)
LLM test generation Coverage-map feedback, Q-value 51–77% improved branch coverage over greedy baselines (Amayuelas et al., 6 Apr 2026)
Multi-agent RL Mixed, context-calibrated bonuses Increased coordinated discovery, robust to peer randomness (Reyes et al., 2022, Pan et al., 25 Sep 2025)

In robotics, curiosity-based systems achieve sample-efficient, open-ended skill learning and self-organized curriculum generation, enabling agents to autonomously discover rare affordances, such as multi-object manipulation (Laversanne-Finot et al., 2018, Oudeyer, 2018). In scientific experimentation, curiosity-guided robots systematically uncover rare phenomena and nontrivial parameter dependencies that random exploration misses (Grizou et al., 2019). In natural multi-agent or virtual environments, calibrated curiosity enables robust, goal-agnostic exploration, even under severe partial observability and stochasticity (Tinguy et al., 2023, Reyes et al., 2022, Pan et al., 25 Sep 2025).

5. Key Advances, Limitations, and Theoretical Underpinnings

Several key theoretical advances and empirical findings shape the current understanding of curiosity-driven exploration:

  • Robustness to stochasticity: Bayesian surprise and information-gain approaches avoid pathological reward amplification in unpredictable environments, a known failure mode for prediction-error-based bonuses (Mazzaglia et al., 2021, Tinguy et al., 2023).
  • Plasticity-stability tradeoff: Episodic memory mechanisms preserve model stability while allowing for fast adaptation, essential for continual online exploration (Schillaci et al., 2020).
  • Structured curiosity and hierarchical goals: Goal- or question-based curiosity agents can efficiently explore combinatorial or relational spaces by focusing on interpretable abstractions, rather than undifferentiated novelty (Kaur et al., 2021, Laversanne-Finot et al., 2018, Schillaci et al., 2020).
  • Meta-learned exploration: Automated discovery of curiosity mechanisms via programmatic search uncovers variants that match or exceed state-of-the-art human-designed algorithms across discrete and continuous domains (Alet et al., 2020).
  • Multi-agent coordination: Calibrated, context-aware curiosity bonuses promote both individual and group-level exploration, addressing the pitfalls of uniform novelty bias or uncoordinated redundant search (Reyes et al., 2022, Pan et al., 25 Sep 2025).

Limitations include vulnerability to reward saturation or curiosity "blockade" when transitions are either trivially predictable or practically unreachable, and the continuing challenge of balancing exploration with task exploitation in the long-term learning regime (Pathak et al., 2017, Oudeyer, 2018). Furthermore, practical deployment in large, dynamic real-world environments remains challenging due to limits in world-model adaptation and computational overheads (Tinguy et al., 2023).

6. Research Directions and Cross-Domain Impact

Active research topics in curiosity-driven exploration include:

Synthesizing normative Bayesian exploration, developmental robotics heuristics, and modern deep RL architectures, curiosity-driven exploration remains a central principle for scalable, autonomous, and robust learning in artificial agents (Oudeyer, 2018, Laversanne-Finot et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Curiosity-driven Exploration.