Hierarchical Reinforcement Learning Model
- Hierarchical Reinforcement Learning is a framework that builds multi-level latent representations grounded in sensorimotor interactions for effective knowledge transfer.
- It employs query processes to model abstract transitions, enabling recursive policy learning and robust planning over coarse state representations.
- The model offers improved sample efficiency and robustness in complex, partially observable environments compared to traditional flat reinforcement learning methods.
Hierarchical Reinforcement Learning (HRL) models introduce multi-level abstractions to reinforcement learning, enabling agents to learn, plan, and transfer knowledge more effectively over long temporal horizons and in partially observable or complex environments. The framework defined in "Grounding Hierarchical Reinforcement Learning Models for Knowledge Transfer" (Wernsdorfer et al., 2014) formalizes a deep, model-based variant of HRL that autonomously constructs abstract representations—latent states—directly grounded in sensorimotor interaction histories. This architecture supports robust knowledge transfer, recursive policy learning, and improved efficiency compared to traditional model-free RL.
1. Sensorimotor Grounding and Hierarchical Structure
At its foundation, the HRL model builds abstract representations by grounding all higher-level latent variables in concrete sensorimotor experiences. The atomic unit of interaction is the sensorimotor state:
where is the previous action and is the current sensory observation. Rather than separating perception and action, the model treats the concatenation as a coherent experience.
Histories (sequences of experience tuples ) form the substrate for abstraction. An abstraction function maps histories to latent states :
This latent state, directly derived from sensory and motor interaction, enables the formation of multi-level representations. By recursively applying abstractions over histories of lower-level latent states, the model induces arbitrarily deep hierarchies, with each abstract state at level summarizing observed interaction context from level .
2. Query Processes and Abstract Transitions
Transition modeling in the latent (abstract) space is handled using query processes (QPs), which differ fundamentally from flat MDP formulations. Whereas standard RL uses a transition function , QPs define a query function:
indicating success or failure in transitioning between latent states. The model learns to approximate this via a learned predicate . At each latent state , a local policy is associated, where estimates the expected return when occupying .
Transitions in abstract space are realized by querying whether a transition from current latent state to a candidate is feasible (i.e., ), and validated by confirming for the history generated under the queried policy.
3. Learning: Model-based Updates with Sensorimotor Integration
The learning process operates on sensorimotor states as the atomic units, adapting standard RL update rules, such as SARSA, for the value function:
or, with normalization:
For the model function (inducibility), the update is:
with if the query results in the observed next state, 0 otherwise.
Query selection at each layer prioritizes transitions whose predicted inducibility exceeds a threshold , and maximizes the value :
By recursively stacking QPs, higher levels perform planning over coarse state representations encoded as latent states, all ultimately grounded in sensorimotor transitions.
4. Knowledge Transfer via Latent Abstraction
The architecture enables efficient knowledge transfer in two primary manners:
- Autonomous Abstraction: The abstraction function groups together sensorimotor histories with similar regularities. This causes the learned latent policies to become independent of specific low-level details, allowing application in different tasks or environments where abstract relationships persist, even if the underlying sensorimotor specifics differ.
- Policy Reuse Across Hierarchy: Since every latent state is paired with its policy, once an agent acquires an abstract representation in one setting, these policies can be queried and reused in physically or contextually novel situations that share similar latent structure but differ in surface features. This mechanism bypasses the need to relearn low-level control whenever higher-level abstractions remain valid.
5. Comparative Advantages of Deep Model-Based HRL
The hierarchical model-based architecture results in the following empirical and theoretical benefits over model-free RL:
- Sample Efficiency via Planning: By learning explicit models (inducibility and QPs), agents can perform lookahead planning. Given learned models, simulated rollouts enable efficient evaluation of action sequences, improving sample efficiency in new or dynamic environments:
- Complexity Reduction through Abstraction: Hierarchical abstractions contract the combinatorial sensorimotor state space to a manageable set of latent states, enhancing generalization and accelerating learning in multi-task or partially observable domains.
- Grounding and Robustness: Since abstractions are built from actual experience—sensorimotor histories—rather than hand-designed symbols, the symbol grounding problem is attenuated, resulting in more meaningful high-level representations that remain coupled to actionable behavior.
- Partial Observability Handling: Latent states discovered from interaction histories are less sensitive to missing or ambiguous observations, and can “collapse” ambiguities encountered by flat model-free methods, improving robustness in POMDP settings.
6. Schematic Table of Hierarchy Construction
Level (l) | State | Policy Components | Transition Model |
---|---|---|---|
0 (Base) | x = (a, s) | V₀, M₀ | Sensorimotor (observed) |
1 | l₁ ∈ L₁ | V₁, M₁ | QP over latent states (M₁) |
... | ... | ... | ... |
N | l_N ∈ L_N | V_N, M_N | QP at highest abstraction |
At each layer, policies and model transitions are tied to the respective latent or sensorimotor representation, forming a recursive stack.
7. Implications and Research Context
This hierarchical, deep model-based approach formalizes a recursive, sensorimotor-grounded process for learning and transferring knowledge, departing from “flat” RL methods and preceding most subsequent HRL architectures by integrating value modeling and explicit transition modeling at all abstraction levels (Wernsdorfer et al., 2014). The recursive construction via histories and QPs anticipates later advances in hierarchical RL, especially with respect to knowledge transfer, abstraction induction, and improved sample efficiency in environments with partial observability, sparse rewards, and changing task structure.
By providing a mathematically rigorous and algorithmically explicit framework for hierarchical abstraction and policy learning, this model remains a reference point for the unification of deep learning, model-based reasoning, and RL in complex, open-ended environments.