Resource-Rational Hierarchical Decomposition

Updated 25 February 2026

Resource-rational hierarchical decomposition is a framework that optimally balances task rewards with the costs of computation, memory, and information processing.
It utilizes a multi-level optimization process, where action selection, subgoal choice, and hierarchical design collectively enhance planning and reinforcement learning efficiency.
The approach enables practical applications—from gridworld navigation to control systems—by achieving compressed representations and substantial computational savings.

Resource-rational hierarchical decomposition refers to a class of principled frameworks that define how hierarchical representations or policies should be structured to optimally allocate limited computational, memory, or information-processing resources. This paradigm synthesizes ideas from hierarchical planning, bounded rationality, information theory, and reinforcement learning. Across different instantiations—spanning shortest-path planning, policy specialization, and value-function factorization—resource-rational decomposition formalizes which hierarchical abstractions or decompositions should be constructed, given the agent’s resource constraints and task demands, to best trade off task performance against computational and representational effort (Correa et al., 2020, Hihn et al., 2019, Marthi et al., 2012).

1. Formal Foundations and Objective Functions

Resource-rational hierarchical decomposition characterizes hierarchical task or policy representations as the solution to a constrained or regularized optimization, where optimality is defined not only in terms of task reward or utility, but also in terms of the cost of cognitive search, information-processing, or memory. The specific form of the objective depends on the domain and the nature of resource constraints:

Shortest-path setting: The agent chooses a set of subgoals $Z$ to maximize expected subtask-level value, defined as the difference between reward (negative path length) and planning cost (e.g., number of search nodes expanded). The high-level objective is

$Z^* = \arg\max_{Z\subseteq S}\, E_{(s,g)\sim p}[\,V^g_Z(s)\,],$

where $V^g_Z(s)$ is the Bellman-optimal subtask-level value for reaching $g$ from $s$ via subgoals $Z$ (Correa et al., 2020).

Information-theoretic decision making: The agent partitions the state space into region-experts, subject to mutual information constraints reflecting bounded rationality. The hierarchical policy maximizes

$\mathbb{E}[\,U(s,a)\,] - \frac{1}{\beta_1} I(S;X) - \frac{1}{\beta_2} I(S;A|X),$

where $U$ is utility, and the $\beta$ terms control information-resource trade-offs (Hihn et al., 2019).

These formulations explicitly trade off task performance with search or representation cost, leading to optimal hierarchies that are sensitive to both environment structure and computational resource limits.

2. Multi-level Hierarchical Optimization

Resource-rational decomposition exhibits a characteristic multi-level structure in the underlying optimization, with each level associated with a distinct aspect of the planning or policy process:

Action-level optimization: Plans concrete action sequences given a subgoal or context.
Subgoal/policy-selection level: Chooses the order or assignment of subtasks or specialized policies (experts) to maximize intermediate value, penalized by the search or information-processing cost at this level.
Decomposition/design level: Selects the decomposition itself—e.g., subgoal sets, hierarchical clusters, or expert assignments—that optimizes overall performance-cost trade-off (Correa et al., 2020, Hihn et al., 2019).

The optimization structure is typically nested, with higher levels depending recursively on value/cost calculations from lower levels.

3. Algorithmic Approaches and Representational Efficiency

Instantiation of resource-rational decomposition depends on the planning or learning paradigm:

Graph-search planning: Planning cost is operationalized as the number of nodes expanded (BFS or A*). Hierarchical subgoal sets $Z$ are selected to minimize average planning cost while bounding the memory/representation cost (e.g., limiting $|Z|$ ) (Correa et al., 2020).
Hierarchical Reinforcement Learning (HRL): Value-function components are recursively decomposed and only those components relevant to both parent and child subroutines are represented, via factored or projected representations. Structural conditions such as "additive irrelevance" and "factored exit" enable dramatic reduction in representation and sample complexity (Marthi et al., 2012).
Information-theoretic partitioning: Soft, mutual-information-regularized selectors route states to expert policies, each with bounded information-processing. The online learning is achieved via actor–critic gradient updates, in which per-step objectives include both task utility and information cost (Hihn et al., 2019).

These algorithmic choices enable efficient learning and tractable hierarchical abstraction even in large-scale or high-dimensional problems.

4. Emergence and Selection of Hierarchies

The selection of hierarchical decomposition is determined not only by environment structure, but crucially by the agent’s resource constraints—cognition, computation, or information rate. Salient findings from empirical and simulation studies include:

Bottleneck discovery: In environments with bottleneck states (e.g., "doorways" in two-room gridworlds), resource-rational models predict subgoal placement at bottlenecks for both uninformed and heuristic searchers, thus aligning with observed human hierarchies (Correa et al., 2020).
Information-driven specialization: Increasing the information budget ( $\beta_1, \beta_2$ ) leads to crisper state partitions and sharper policy specialization. With limited resources, the system smoothly interpolates from monolithic to specialized policies, optimizing global utility under constraint (Hihn et al., 2019).
Representation compression: Exploiting independence or decoupling (factored exit/separator structures) enables local policies to reason only about a reduced set of variables, confining resource investment precisely to those variables that affect and are affected by local and parent modules (Marthi et al., 2012).

5. Theoretical Properties and Efficiency Bounds

Resource-rational frameworks offer rigorous guarantees regarding optimality, suboptimality due to resource constraints, and representational/computational efficiency:

Optimality structure: Exact solutions can be obtained (e.g., via Blahut–Arimoto in information-theoretic settings), with suboptimality precisely bounded by information rates ( $\frac{1}{\beta_1}I(S;X) + \frac{1}{\beta_2}I(S;A|X)$ ) (Hihn et al., 2019).
Representation cost: The use of factored and projected representations compresses parameter complexity from the full state space to the relevant marginals, often yielding orders-of-magnitude savings (compactness) in memory and statistical sample requirements (Marthi et al., 2012).
Computational efficiency: Hierarchical decomposition reduces average-case planning/learning time by restricting expensive computation to critical subspaces or decision points, with ablation studies reporting substantial speed-ups and maintained hierarchical optimality (Correa et al., 2020, Marthi et al., 2012).

6. Applications and Generalizations

Resource-rational hierarchical decomposition has broad applicability across domains:

Gridworld navigation and Tower of Hanoi planning: Principled subgoal selection aligns with intuitive task decompositions and human reaction-time data (Correa et al., 2020).
Classification and regression: Mixtures of linear experts, each assigned to a partition of input space, achieve non-linear performance under information-processing constraints (Hihn et al., 2019).
Control and gain-scheduling: Policy switching among local controllers in dynamical systems realizes efficient piecewise-linear control regimes without manual scheduling surfaces (Hihn et al., 2019).
Reinforcement learning in structured environments: Recursive value-function composition with state abstraction enables hierarchically optimal control and efficient sample usage (Marthi et al., 2012).

The frameworks naturally extend to deeper hierarchies and cooperative multi-agent networks, with layered selectors/experts and distributed information budgets. Under general conditions—differentiable policies, sufficient functional capacity, stable gradients—convergence to locally optimal resource-rational solutions is ensured (Hihn et al., 2019).

7. Normative Insights and Interpretive Remarks

Resource-rational decomposition offers a unifying, normative perspective: hierarchical representations and subgoal structures are not arbitrarily imposed, but arise as principled solutions to the joint optimization of task efficiency and resource expenditure. This perspective departs from simple trajectory compression or unsupervised community detection by explicitly conditioning decomposition on the planner’s algorithms, the structure of the environment, and the agent’s cognitive constraints (Correa et al., 2020, Marthi et al., 2012). Hierarchical structure thus constitutes a form of representational investment, economizing downstream effort by anticipating and optimizing the allocation of limited computational and memory resources.