Hierarchical Agent Workflow (HAWK)

Updated 30 November 2025

Hierarchical Agent Workflow (HAWK) is a modular, multi-level paradigm for decomposing tasks into a DAG of goals and constraints, enabling adaptive orchestration of heterogeneous LLM agents.
It employs formal task decomposition, interactive oversight, and conflict resolution mechanisms that ensure robust execution and consistency across agent actions.
HAWK underpins state-of-the-art multi-agent systems used in domain-general automation, scientific workflows, and specialized industrial applications, offering measurable performance improvements.

A Hierarchical Agent Workflow (HAWK) is a modular, multi-level paradigm for decomposing and orchestrating goal-oriented tasks among heterogeneous agents—typically LLM-based or tool-augmented—by structuring global user intent as a directed acyclic graph (DAG) of hierarchical goals, constraints, and agent assignments, enabling robust collaboration, adaptive scheduling, and interactive oversight. HAWK is foundational to state-of-the-art orchestration in multi-agent LLM frameworks, supporting domains ranging from general automation and cross-domain assistance to specialized industrial, scientific, and artistic workflows (Zhou, 28 Oct 2025, Hu et al., 29 May 2025, Hou et al., 17 May 2025, Li et al., 21 Nov 2025).

1. Formal Model: Hierarchical Structure and Constraints

HAWK imposes an explicit formalism for task decomposition and inter-agent coordination. Core constructs include:

Goal hierarchy: Let $V = \{g_1, \dots, g_n\}$ be goals, and $E \subset V \times V$ the directed parent $\rightarrow$ child “decomposition” relation. The goal graph $G = (V, E)$ forms a rooted DAG; leaves represent “primitive” goals mapped to direct agent actions (Zhou, 28 Oct 2025).
Decomposition function: $f: V \rightarrow 2^V$ associates each goal with its immediate subgoals, i.e., $f(g) = \{h \mid (g, h) \in E\}$ .
Agents and capabilities: Let $A = \{a_1, \dots, a_m\}$ be agent set, with $skill(a) \subset Capabilities$ the executable profile and $R(g)$ the capability signature required by $g$ .
Constraint set $C$ : Each element is either:
- Hard precedence $(g_i \prec g_j)$ : $g_i$ precedes $g_j$ ,
- Resource conflict $(g_i \nleftrightarrow g_j)$ : $g_i$ and $g_j$ not parallelizable,
- Temporal (“no overlap”, “start-by”, etc.) (Zhou, 28 Oct 2025, Li et al., 21 Nov 2025).
Executable task graph $T$ : The induced subgraph on leaves plus precedence/dataflow arcs.

Predicates $P_g(state) \rightarrow \{true, false\}$ allow machine-verifiable specification of goal completion and constraint adherence at runtime.

2. Pipeline: Goal Parsing, Alignment, and Human Oversight

The HAWK orchestration pipeline in leading systems proceeds as:

Natural Language Parsing: LLM-based modules extract candidate goal texts ( $G_0$ ), their decomposition structure ( $E_0$ ), and normalize attributes (e.g., budget, date) from free-form user instructions (Zhou, 28 Oct 2025).
Ontology Grounding & Predicate Synthesis: Each goal is grounded to a domain ontology and equipped with $P_g$ for automated status checking.
Interactive Editing: UI exposes the high-level tree; users edit, reorder, or constrain nodes, or insert new subgoals. Edits trigger grammar/ontology compliance checks.
Consistency Repair: System enforces well-formedness and surfaces inconsistencies for user resolution.
Confirmation: The finalized DAG $(V, E)$ and $C$ proceed to orchestration.

The OrchVis “Planning Panel” (Zhou, 28 Oct 2025) exemplifies advanced human-in-the-loop orchestration: a synchronized, collapsible tree of goals/constraints and a task-graph view of agent assignments/tool invocations (via REST/WebSocket APIs), supporting reordering, visualization, and selective expansion to optimize cognitive load ( $O(|V|)$ to $O(\log |V|)$ effort scaling).

3. Orchestration Algorithm: Agent Matching and Adaptive Execution

HAWK workflows instantiate deterministic assignment and runtime verification as follows (Zhou, 28 Oct 2025):

Task Extraction: Identify leaves $L = \{g \in V \mid f(g) = \emptyset\}$ .
Agent Assignment: For each $g \in L$ , select $a^* = \arg_{a \in A}\; skill(a) \supseteq R(g)$ ; fallback to user if no agent matches.
Task Graph Instantiation: Record mapping $M[g] := a^*$ , add constraint-induced precedence arcs.
Execution Loop: Topologically sort $T$ $T$ to queue $Q$ $Q$ . For $g \leftarrow pop\_front(Q)$ $g \leftarrow p o p_f ro n t (Q)$ :
- Dispatch $M[g]$ .execute( $g$ ) and collect result $r_g$ .
- Verify with $Verifier(g, r_g)$ : returns hard/soft satisfaction (see below).
- On hard constraint violation, invoke conflict resolution module:
  - Pause subsequent tasks, compute difference report $\Delta$ , present auto-generated repair proposals, allow user-selected “repair” or auto-replan with LLM-generated alternative plans.
Conflict Recovery: Upon resolution, resume from corrected node.

Quantitative Metric: The only formal satisfaction score in the OrchVis HAWK is per-goal:

$S(g) = (|{\text{hard constraints true}}| / |{\text{hard}}|) + \lambda \cdot (|{\text{soft constraints true}}|/|{\text{soft}}|)$

Violation of hard constraints halts downstream execution, precluding propagation of inconsistent states (Zhou, 28 Oct 2025).

4. Comparative Architectures and Generalization Strategies

HAWK generalizes across instantiations, extending to classic three-layer (Planner–Coordinator–Worker) stacks as in OWL/Workforce (Hu et al., 29 May 2025), as well as role-specialist and domain-centric variants:

Instantiation	Planning (Top)	Middle/Assignment	Execution (Worker)
OrchVis (Zhou, 28 Oct 2025)	LLM goal parser + user	Agent matcher + panel	LLM/tool agent per primitive goal
Workforce/OWL (Hu et al., 29 May 2025)	Domain-agnostic	Coordinator (schedule/assign)	Specialized tool workers
HALO (Hou et al., 17 May 2025)	High-level agent	Dynamic role spawner	Low-level agents + MCTS search
HTAM/EarthAgent (Li et al., 21 Nov 2025)	Layered policy LLMs	Task-centric layer assignment	Domain-specific tool agents

Distinctives:

OWL introduces RL-based planner optimization (SFT + DPO), maximizing expected trajectory reward $J(\theta)$ , yielding strong generalization (52.73% on GAIA, +16.37% over base).
HALO employs a step-wise planner that issues subtasks sequentially, mid-level role assignment, and low-level agentic workflow search via MCTS for expert tasks (14.4% improvement over SOTA on MMLU/MATH).
HTAM structures domain-specialized agents as a layered DAG, enforcing dataflow and procedural constraints by construction.

5. API, Tooling, and Quantitative Outcomes

Key implementation and evaluation features:

REST/WebSocket APIs for asynchronous updates and human intervention (e.g., /api/goals, /api/tasks, /api/resolve) (Zhou, 28 Oct 2025).
Continuous Verification: All agent actions are machine-checked; users oversee only flagged conflicts or high-level planning choices.
Visualization: D3/React-based goal and agent graphs, collapsed views, real-time progress subscription.
Empirical Metrics: OrchVis and Workforce HAWK variants report strong open-source GAIA performance (HAWK=69.70% (Hu et al., 29 May 2025); Workforce Claude-3.7-Sonnet, outperforming commercial baselines); ablation demonstrates centrality of planner vs. worker training; domain-transfer validated via OWL.

6. Limitations, Cognitive Load, and Future Directions

No Asymptotic Optimality: HAWK frameworks do not assert classical search completeness or optimality (Zhou, 28 Oct 2025).
Cognitive Boundaries: Human effort is minimized by collapsing subgoal trees; scaling is $O(|V|)$ (direct manipulation) to $O(\log|V|)$ (top-level only) (Zhou, 28 Oct 2025).
Limitations: Quality bottlenecked by agent/tool repertoire; latency in agentic RL feedback loops can be high; precise specification of domain ontologies is required in specialized domains.
Extensions: Multi-planner or deeper hierarchies (strategic–tactical phase separation); domain adaptation by registering new agents with specialized APIs; research into auto-stratification and dynamic, cross-layer rollback mechanisms.

7. Significance and Theoretical Foundations

HAWK realizes a principled separation between declarative intent parsing, deterministic agent matching, continuous verification, and conflict-guided human oversight, enabling end-to-end enforcement of hard constraints before downstream action. It supports modular, adaptive, and user-guided orchestration of multi-agent LLM systems, robust to real-world ambiguity and agent failure. These properties establish HAWK as a theoretical and practical blueprint for adaptive, high-reliability large-scale agentic workflows across domains (Zhou, 28 Oct 2025, Hu et al., 29 May 2025, Li et al., 21 Nov 2025).