Multi-Agent Hierarchical Task Generation

Updated 9 October 2025

MAHTG is a systematic framework that decomposes complex multi-agent tasks into manageable subtasks using hierarchical structures and explicit communication protocols.
It leverages techniques such as hierarchical reinforcement learning, symbolic planning, and modular decomposition to reduce the computational complexity of distributed, partially observable environments.
Empirical evaluations show that MAHTG enhances scalability and performance in scheduling and resource allocation compared to traditional flat coordination methods.

Multi-Agent Hierarchical Task Generation (MAHTG) is the systematic creation and decomposition of complex tasks for multiple agents, leveraging explicit hierarchy, modularization, and structured communication to achieve scalable, coordinated, and efficient problem solving. Central to MAHTG is the integration of hierarchical reinforcement learning, symbolic planning, structured communication, and adaptive decomposition, such that global goals are mapped onto a dynamic sequence of coordinated subtasks. The resulting frameworks support robust multi-agent coordination across distributed, partially observable, and resource-constrained environments.

1. Hierarchical Frameworks and Structured Decomposition

MAHTG methods universally exploit hierarchies to manage combinatorial complexity. A two-level architecture is frequently used, with an upper-level meta-controller (or global planner) responsible for decomposing a global coordination task into a sequence of subtasks—each paired with constraints or agent assignments—and lower-level agent controllers responsible for executing assigned subtasks.

In the federated control framework (Kumar et al., 2017), the hierarchy consists of a meta-controller and multiple distributed controllers. At each time step $t$ , the meta-controller observes the global environment and selects a pair of agents $(C_i, C_j)$ together with a constraint $c_t$ , generating a subtask $g_t$ (formally: select $g_t$ , $c_t$ ). This upper-level decision limits agent interactions to manageable, tractable pairs, transforming the intractable global scheduling problem into a sequence of localized negotiations. Each agent pair engages in negotiation solely over their respective constrained resource sets, enabling focused policy learning and greatly reducing the effective search space.

Hierarchical Petri Net (HPN) methods (Figat et al., 2019) formalize task decomposition into multi-layer networks (system, agent, subsystem, behavior, and communication layers), with tokens representing the entrance and propagation of control across different agent subsystems and communication layers. Each hierarchical layer isolates control, coordination, or communication functionality.

The same decomposition principle is observed in reinforcement learning applied to scheduling and resource allocation (Carvalho et al., 2022). Here, high-level centralized schedulers allocate subtasks to decentralized workers, decoupling task assignment from partially observed local task execution.

2. Communication Protocols and Negotiation Strategies

Hierarchical frameworks in MAHTG rely on structured communication among agents to alleviate the bottleneck of joint policy learning. Key methods include:

Pairwise constrained negotiation (Kumar et al., 2017): Instead of all-to-all or broadcast communication, the meta-controller enforces only one pair of agents to communicate per time step. Agent controllers maintain communication state as compact structures (e.g., one-hot vectors tracking partner’s prior decisions). This drastically reduces the complexity from quadratic to linear with the number of agents, while maintaining global coordination.
Layered communication in Petri nets (Figat et al., 2019): Communication is encoded explicitly as "send" and "receive" pages in the net structure, with tokens as data/message carriers. This model supports both intra- and inter-agent communication at specific, well-defined behavioral transitions, enabling design-time verification and modular runtime communication.
Decentralized execution with partial observability (Carvalho et al., 2022): When centralized planners are unavailable or disallowed at test time, decentralized agents cooperate solely via shared reward structures. Policies trained with shared experience and parameter sharing learn cooperative behaviors under partial information without direct communication.

The effect of these communication strategies is seen in the scalability and efficiency of MAHTG systems. Restricting communication and negotiation to structured roles and subtasks enables robust and efficient exploration, allowing learning in large teams (Kumar et al., 2017).

3. Task Decomposition, Constraint Handling, and Learning

MAHTG frameworks decompose global objectives into constraints and pairings, which agents then jointly solve. Core aspects as presented include:

Modular constraint-based subtasks: The meta-controller selects both the agent pair and the constraint at every step, dividing a large global constraint (e.g., joint scheduling) into smaller, manageable local constraints (e.g., "agent $i$ and $j$ must select non-overlapping time slots") (Kumar et al., 2017).
Subtask-centric temporal abstraction: Agents learn policies not over primitive actions, but over the space of subtasks or options defined by the hierarchy. At each subtask, agents receive intrinsic rewards that reflect constraint satisfaction:

$r_{\text{intrinsic}} = \begin{cases} 1, & \text{if } a_i\in D_i\land c_t,\quad a_j\in D_j\land c_t,\ a_i < a_j \ 0, & \text{otherwise} \end{cases}$

where $D_i$ is agent $i$ ’s resource set and $c_t$ is the imposed constraint.

Learning with experience replay: Both upper-level (meta-controller) and lower-level (individual agent) Q-networks are trained by alternating updates with experience replay buffers (R_M, R_C), as in DQN. This ensures dual-level refinement of negotiation and global decision-making.

A plausible implication is that these modular decompositions facilitate efficient exploration and convergence in otherwise intractable multi-agent RL settings, as evidenced by empirical results (Kumar et al., 2017).

4. Empirical Results and Performance Assessment

The federated control framework (Kumar et al., 2017) evaluates the MAHTG approach on simulated distributed scheduling problems under varying complexity:

For $m=2$ agents: All methods—including FCRL, baseline MARL, and non-communicating HRL—find the optimal policy, as only one pairwise negotiation is required.
For $m=4$ agents: FCRL outperforms both baselines. HRL suffers due to no inter-agent communication; MARL, with unstructured communication, achieves moderate success, but the structured pairwise method in FCRL yields superior results.
For $m=6$ agents: Both HRL and MARL fail (no positive rewards) due to increased coordination complexity, while FCRL continues to find valid solutions.

This suggests that hierarchical decomposition with guided interaction scales to larger agent populations where flat or monolithic coordination breaks down.

5. Scalability, Modularity, and Extensions

Key scalability attributes of MAHTG frameworks are:

Linear communication complexity: Pairwise task assignments and communication avoid the quadratic scaling typical in flat MARL, supporting many-agent scenarios (Kumar et al., 2017).
Abstracted joint action spaces: The meta-controller’s decision space (over pairs and constraints) is orders of magnitude smaller than the joint action space of all agents acting simultaneously.
Composability: By abstracting negotiation and control at the subtask level and using modular architectures (e.g., experience replay, Petri nets), the overall system is both extensible and amenable to verification (Figat et al., 2019).
Real-time scheduling and code generation: In robotics, automatic translation from Petri nets to real-time C++ code enables deployment without manual intervention, and the runtime scheduler ensures concurrent subsystem operation (Figat et al., 2019).

A plausible implication is that these structural and algorithmic properties enable MAHTG systems to be applied to large-scale real-world multi-agent settings, maintaining tractability and robustness.

6. Application Domains

MAHTG frameworks are applicable to a variety of domains:

Domain	Decomposition Role	Scalability Benefit
Multi-agent dialogue	Meta-controller allocates subtasks (e.g., booking) to conversational agents	Modular policy composition
Urban traffic control	Agents (vehicles) negotiate pairwise at intersections via meta-scheduling	Local negotiation only
Network resource allocation	Routers coordinate via pairwise negotiation for distributed scheduling	Abstracted communication
Warehouse robotics	Centralized scheduler assigns bufferized task lists to workers	Partitioned action spaces

In all these settings, the hierarchical decomposition enables agents to focus on locally tractable subtasks while achieving global coordination, and communication overhead is tightly bounded.

7. Mathematical Formalization and Algorithmic Outline

The task generation and control process is formalized as a Markov Decision Process $(S, A, T, R, \gamma)$ , with each level of the hierarchy learning separate Q-networks:

Meta-controller update (extrinsic reward):

$Q(s, a) \leftarrow Q(s, a) + \alpha [r + \gamma \max_{a'} Q(s', a') - Q(s, a)]$

optimizing $E[\sum_t \gamma^t r_e(s_t, c_t)]$ .

Controllers (intrinsic reward):

$Q(s, a) \leftarrow Q(s, a) + \alpha [r_\text{intrinsic} + \gamma \max_{a'} Q(s', a') - Q(s, a)]$

Alternating update and experience replay:

The algorithm alternates steps between (1) updating the meta-controller with global state and outcomes and (2) allowing selected agent pairs to negotiate, training their respective controllers with local observations and intrinsic rewards.

This division of learning responsibility and modularity in experience enables robust multi-agent coordination even under increasing agent count and environmental complexity.

8. Theoretical and Practical Implications

By structuring agent negotiation and decision making hierarchically, MAHTG achieves:

Efficient credit assignment through localized intrinsic rewards.
Reduced exploration space via task decomposition and structured agent pairings.
Scalable communication (pairwise negotiations abstracted by meta-controller choices).
Interpretability and modular design when combined with symbolic Petri nets, enabling formal verification (Figat et al., 2019).
Automatable controller synthesis: From formal specification (HPN) to real-time implementation (Figat et al., 2019).

Empirical validation and model guarantees indicate that MAHTG provides a generalizable and robust approach for complex multi-agent scheduling, coordination, and distributed control—demonstrated concretely on distributed scheduling and robot control tasks (Kumar et al., 2017, Figat et al., 2019).

In summary, Multi-Agent Hierarchical Task Generation formalizes the decomposition and orchestration of multi-agent coordination problems into tractable, modular subtasks, leveraging hierarchical control, structured communication, and modular design. These strategies deliver scalable, efficient, and generalizable solutions, bridging theoretical advances in hierarchical reinforcement learning and formal verification with practical multi-agent system design (Kumar et al., 2017, Figat et al., 2019, Carvalho et al., 2022).