Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 80 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 33 tok/s Pro

GPT-5 High 25 tok/s Pro

GPT-4o 117 tok/s Pro

Kimi K2 176 tok/s Pro

GPT OSS 120B 457 tok/s Pro

Claude Sonnet 4.5 32 tok/s Pro

2000 character limit reached

Hierarchical Reinforcement Learning Model

Updated 29 September 2025

Hierarchical Reinforcement Learning is a framework that builds multi-level latent representations grounded in sensorimotor interactions for effective knowledge transfer.
It employs query processes to model abstract transitions, enabling recursive policy learning and robust planning over coarse state representations.
The model offers improved sample efficiency and robustness in complex, partially observable environments compared to traditional flat reinforcement learning methods.

Hierarchical Reinforcement Learning (HRL) models introduce multi-level abstractions to reinforcement learning, enabling agents to learn, plan, and transfer knowledge more effectively over long temporal horizons and in partially observable or complex environments. The framework defined in "Grounding Hierarchical Reinforcement Learning Models for Knowledge Transfer" (Wernsdorfer et al., 2014) formalizes a deep, model-based variant of HRL that autonomously constructs abstract representations—latent states—directly grounded in sensorimotor interaction histories. This architecture supports robust knowledge transfer, recursive policy learning, and improved efficiency compared to traditional model-free RL.

1. Sensorimotor Grounding and Hierarchical Structure

At its foundation, the HRL model builds abstract representations by grounding all higher-level latent variables in concrete sensorimotor experiences. The atomic unit of interaction is the sensorimotor state:

$x = (a_{t-1}, s_t), \quad x \in \mathrm{SM}$

where $a_{t-1}$ is the previous action and $s_t$ is the current sensory observation. Rather than separating perception and action, the model treats the concatenation as a coherent experience.

Histories $h = [e_{t-n}, \ldots, e_t]$ (sequences of experience tuples $e$ ) form the substrate for abstraction. An abstraction function $A: H \to L$ maps histories $H$ to latent states $L$ :

$A: H \to L$

This latent state, directly derived from sensory and motor interaction, enables the formation of multi-level representations. By recursively applying abstractions over histories of lower-level latent states, the model induces arbitrarily deep hierarchies, with each abstract state at level $l$ summarizing observed interaction context from level $l-1$ .

2. Query Processes and Abstract Transitions

Transition modeling in the latent (abstract) space is handled using query processes (QPs), which differ fundamentally from flat MDP formulations. Whereas standard RL uses a transition function $\mathcal{T}: S \times A \to S$ , QPs define a query function:

$\mathcal{Q}: L \times L \to \{\top, \bot\}$

indicating success or failure in transitioning between latent states. The model learns to approximate this via a learned predicate $M: L \times L \to \{\top, \bot\}$ . At each latent state $l$ , a local policy $(V_l, M)$ is associated, where $V_l: L \to \mathbb{R}$ estimates the expected return when occupying $l$ .

Transitions in abstract space are realized by querying whether a transition from current latent state $l_{t-1}$ to a candidate $l_q$ is feasible (i.e., $\mathcal{Q}(l_{t-1}, l_q)$ ), and validated by confirming $A(h_t^{\pi}) = l_q$ for the history $h_t^{\pi}$ generated under the queried policy.

3. Learning: Model-based Updates with Sensorimotor Integration

The learning process operates on sensorimotor states as the atomic units, adapting standard RL update rules, such as SARSA, for the value function:

$V_l(x_{t-1}) \leftarrow V_l(x_{t-1}) + \alpha [ r_{t-1} + \gamma V_l(x_t) - V_l(x_{t-1}) ]$

or, with normalization:

$V_l(x_{t-1}) \leftarrow \frac{V_l(x_{t-1}) + \alpha [ r_{t-1} + \gamma V_l(x_t) - V_l(x_{t-1}) ]}{1+\alpha}$

For the model function (inducibility), the update is:

$I_l(x_{t-1}, q_{t-1}) \leftarrow I_l(x_{t-1}, q_{t-1}) + \alpha (s - I_l(x_{t-1}, q_{t-1}))$

with $s = 1$ if the query results in the observed next state, 0 otherwise.

Query selection at each layer prioritizes transitions whose predicted inducibility $I(x_t, x_i)$ exceeds a threshold $c$ , and maximizes the value $V_l(x_i)$ :

$D(x_t) = \arg\max_{x_i \in X_i} V_l(x_i), \quad \text{subject to } I(x_t, x_i) \geq c$

By recursively stacking QPs, higher levels perform planning over coarse state representations encoded as latent states, all ultimately grounded in sensorimotor transitions.

4. Knowledge Transfer via Latent Abstraction

The architecture enables efficient knowledge transfer in two primary manners:

Autonomous Abstraction: The abstraction function $A$ groups together sensorimotor histories with similar regularities. This causes the learned latent policies $\pi_l = (V_l, M)$ to become independent of specific low-level details, allowing application in different tasks or environments where abstract relationships persist, even if the underlying sensorimotor specifics differ.
Policy Reuse Across Hierarchy: Since every latent state is paired with its policy, once an agent acquires an abstract representation in one setting, these policies can be queried and reused in physically or contextually novel situations that share similar latent structure but differ in surface features. This mechanism bypasses the need to relearn low-level control whenever higher-level abstractions remain valid.

5. Comparative Advantages of Deep Model-Based HRL

The hierarchical model-based architecture results in the following empirical and theoretical benefits over model-free RL:

Sample Efficiency via Planning: By learning explicit models (inducibility and QPs), agents can perform lookahead planning. Given learned models, simulated rollouts enable efficient evaluation of action sequences, improving sample efficiency in new or dynamic environments:

$V(s, a) = R(s, a) + \gamma \sum_{s'} T(s, a, s') \max_{a'} V(s', a')$

Complexity Reduction through Abstraction: Hierarchical abstractions contract the combinatorial sensorimotor state space to a manageable set of latent states, enhancing generalization and accelerating learning in multi-task or partially observable domains.
Grounding and Robustness: Since abstractions are built from actual experience—sensorimotor histories—rather than hand-designed symbols, the symbol grounding problem is attenuated, resulting in more meaningful high-level representations that remain coupled to actionable behavior.
Partial Observability Handling: Latent states discovered from interaction histories are less sensitive to missing or ambiguous observations, and can “collapse” ambiguities encountered by flat model-free methods, improving robustness in POMDP settings.

6. Schematic Table of Hierarchy Construction

Level (l)	State	Policy Components	Transition Model
0 (Base)	x = (a, s)	V₀, M₀	Sensorimotor (observed)
1	l₁ ∈ L₁	V₁, M₁	QP over latent states (M₁)
...	...	...	...
N	l_N ∈ L_N	V_N, M_N	QP at highest abstraction

At each layer, policies and model transitions are tied to the respective latent or sensorimotor representation, forming a recursive stack.

7. Implications and Research Context

This hierarchical, deep model-based approach formalizes a recursive, sensorimotor-grounded process for learning and transferring knowledge, departing from “flat” RL methods and preceding most subsequent HRL architectures by integrating value modeling and explicit transition modeling at all abstraction levels (Wernsdorfer et al., 2014). The recursive construction via histories and QPs anticipates later advances in hierarchical RL, especially with respect to knowledge transfer, abstraction induction, and improved sample efficiency in environments with partial observability, sparse rewards, and changing task structure.

By providing a mathematically rigorous and algorithmically explicit framework for hierarchical abstraction and policy learning, this model remains a reference point for the unification of deep learning, model-based reasoning, and RL in complex, open-ended environments.

PDF Markdown Chat (Pro)

References (1)

Grounding Hierarchical Reinforcement Learning Models for Knowledge Transfer (2014)

Follow Topic

Get notified by email when new papers are published related to Hierarchical Reinforcement Learning Model.