Hierarchical Reinforcement Learning Methods

Updated 12 September 2025

Hierarchical reinforcement learning methods are algorithmic strategies that decompose complex tasks into layered subtasks using temporal and representational abstractions.
They integrate frameworks like the options model, query process abstractions, and latent variable policies to enhance exploration, sample efficiency, and knowledge transfer.
These approaches offer robust sensorimotor-grounded representations but face challenges with scalable abstraction construction and efficient hierarchy stacking.

Hierarchical Reinforcement Learning (HRL) methods refer to a broad set of algorithmic strategies that introduce multiple levels of temporal or representational abstraction into reinforcement learning agents. HRL exploits the often-inherent hierarchical structure present in complex environments, enabling agents to decompose tasks into sub-tasks and build layered policies, subgoal abstractions, or skill repertoires. The goal is to improve exploration, sample efficiency, transfer, and interpretability by leveraging the temporal and compositional regularities that cannot be efficiently exploited by monolithic “flat” RL policies.

1. Theoretical Foundations and Motivations

Hierarchical RL is motivated by two key observations. First, many complex tasks consist of subtasks or can be naturally decomposed into nested, temporally extended sequences of actions (“options”). Second, decomposing RL into interacting modules operating on different time-scales, state abstractions, or policy spaces can yield improved sample efficiency, generalization, and credit assignment in long-horizon or sparse-reward problems.

Formally, HRL is often cast in terms of semi-Markov decision processes (SMDPs) with temporally extended actions. The options framework is one widely adopted model, where each option is defined by an initiation set, a policy, and a termination condition. Alternatively, methods can discover or predefine sub-goals, employ latent-variable models, or leverage explicit high-level planners.

2. Approaches to Hierarchical Structure and Representation Learning

A core challenge in HRL is discovering or constructing useful abstractions. Several distinct methodologies have been developed:

Query Process Abstractions: In “Grounding Hierarchical Reinforcement Learning Models for Knowledge Transfer” (Wernsdorfer et al., 2014), the HRL architecture forms abstractions by mapping sensorimotor interaction histories $H$ to latent state representations $L$ using a learned abstraction function $A: H \rightarrow L$ . These representations are “grounded” in the agent’s actual experience, rather than imposed a priori. The hierarchical policy builds value and transition models not only over raw sensorimotor states $x = (a_{t-1}, s_t)$ , but also over abstract latent states. Actions at higher levels are framed as query processes $\mathcal{Q}: L \times L \rightarrow \{\top, \bot\}$ .
Deep Model-Based HRL: This approach unifies model-based RL and deep representation learning, enabling agents to autonomously construct and ground arbitrarily abstract policies through iterative sensorimotor interaction. Both low-level and high-level representations are updated simultaneously via modified SARSA learning and model estimation, bypassing the need for designer-imposed “semantic” spaces (Wernsdorfer et al., 2014).
Latent Variable Policies: Other families, such as those leveraging invertible mappings between latent spaces and action spaces and maximum entropy objectives, allow each layer of a hierarchical policy to develop a set of diverse strategies, with higher-level policies modulating lower ones via their latent variables (Haarnoja et al., 2018).

3. Autonomous Discovery of Hierarchical Abstractions

Key technical mechanisms for discovering HRL structures include:

Approach	Description	Core Mathematical Objects
Sensorimotor Query	Abstraction from sensorimotor histories to latent	$A: H \rightarrow L$ , $\mathcal{Q}$
Option Framework	Initiation/termination of temporally extended actions	$O = (I_o, \pi_o, \beta_o)$
Subgoal Induction	Mapping experiences into subgoals using clustering, anomaly detection, or skill chaining	Clustering in state/reward, K-means centers, anomaly states

In “Grounding Hierarchical Reinforcement Learning Models for Knowledge Transfer” (Wernsdorfer et al., 2014), the method enables the agent to autonomously learn a hierarchy by aggregating sensorimotor experiences and constructing abstractions without a fixed pre-defined world state set. Each new latent state is added only when a novel interaction sequence is detected, which allows for efficient knowledge reuse and flexible adaptation across tasks.

A crucial formulation is the abstraction function and query policy:

$A: H \rightarrow L \quad \text{where } H \text{ is a history (sequence) of sensorimotor states, } L \text{ is a latent state set}$

$\mathcal{Q}: L \times L \rightarrow \{\top, \bot\}$

with $V_{\ell}$ (abstract value function) and $M$ (model) tied to each latent state.

4. Knowledge Transfer and Generalization

One of HRL’s principal advantages is its capacity for knowledge transfer. The structure and parameterization of abstract/layered policies, options, or latent skills enable agents to reuse learned behaviors in novel environments, provided that the subjective sensorimotor experience is similar, even when the objective state configuration is new. In the referenced work (Wernsdorfer et al., 2014), transfer is studied between “objective” interaction (where states are defined by absolute positions) and “subjective” interaction (states are defined by local configurations). Hierarchical policies grounded in sensorimotor histories disambiguate perceptually similar but contextually distinct situations and transfer policies robustly across task variants.

Experiments demonstrate that whereas flat Markov policies struggle in ambiguous subjective environments, hierarchically learned query policies correctly leverage recent motor activations and observations—effectively incorporating short-term memory—yielding robust transfer. For example, action sequences such as “if you last turned and now see a wall to your right, then turn left…” allow the agent to abstract away from absolute position information.

5. Mathematical Formulation and Learning Updates

Key mathematical constructs central to hierarchical RL include the definition of the underlying Markov Decision Process (MDP) $(S, A, \mathcal{T}, \mathcal{R})$ , hierarchical mapping functions, value estimation, and update rules. The work in (Wernsdorfer et al., 2014) formalizes the framework as follows:

Low-level sensorimotor state: $x_t = (a_{t-1}, s_t)$
Abstraction to latent state: $l = A(H)$ for history $H$
Value function for sensorimotor state:

$V_{\ell}(x_{t-1}) \leftarrow \frac{V_{\ell}(x_{t-1}) + \alpha( r_{t-1} + \gamma V_{\ell}(x_t) - V_{\ell}(x_{t-1}) )}{1+\alpha}$

Abstract value function: $V_{\ell}: L \rightarrow \mathbb{R}$
Query process and transition modeling: $\mathcal{Q}: L \times L \rightarrow \{\top, \bot\}$ , with model $M$ capturing transition feasibility.

This bridges the gap between model-based RL (with explicit system dynamics) and deep abstraction-based HRL, with both perception and action unified in sensorimotor representations.

6. Comparison to Conventional HRL and Limitations

The principal distinguishing features of this approach, compared to conventional HRL methods (such as those based on static, designer-imposed option structures or partitioned MDPs), are:

No static state space: Latent abstractions are constructed from lived experience, reducing the reliance on semantic priors.
Integrated perception and action: Sensorimotor states blend observations and actions, yielding unified, experience-driven representations.
Flexible, grounded query mechanism: The decision to activate or transition between abstractions is made via a query process grounded in history, not simply via observable transitions.
Improved transfer: Grounded structures permit greater knowledge transfer in domains where perceptual aliasing or unseen states would hinder flat RL.

However, these benefits introduce increased state space complexity (especially when pairing actions and observations), and construction of robust abstraction functions $A$ remains a substantial challenge; histories may be insufficiently expressive, and disambiguating similar experiences can require higher-level context.

7. Open Problems and Future Directions

Two open problems are emphasized in the literature:

Construction of robust abstraction functions: Developing functions $A$ that reliably and efficiently summarize histories for latent state induction, especially in high-dimensional or partially observed domains, is unresolved. Additional context, inductive biases, or hierarchical priors may be necessary.
Scalability and hierarchy stacking: Autonomous stacking of query processes to form multi-level hierarchies is essential for large-scale or compositional tasks. Mechanisms for passing information, such as prior probabilities or biasing representations from high to low levels, are suggested as promising directions.

A key ongoing question is balancing the computational and model complexity incurred by richer hierarchical structures—especially regarding value/model function sizes and sample efficiency—against the gains in transfer and efficiency.

Hierarchical reinforcement learning methods, particularly as exemplified by frameworks based on sensorimotor-grounded abstraction and query processes (Wernsdorfer et al., 2014), provide a rigorous, flexible means of learning highly abstract, transferable representations from raw interaction data without imposing strong designer-driven priors. Although this approach shows significant promise for knowledge transfer and autonomy in complex, ambiguous environments, challenges in abstraction construction and representation scaling remain active research areas. The field continues to advance toward robust hierarchical architectures capable of supporting transfer, interpretability, and efficient exploration in artificial agents.