Agentic Workloads Overview
- Agentic workloads are autonomous multi-step computational pipelines where LLM agents integrate dynamically with tools to plan, execute, and adapt actions.
- They rely on dynamic control flow, resource-aware scheduling, and hierarchical caching to boost throughput, reduce latency, and improve reliability.
- These workflows incorporate safe-by-design measures and robust governance, validated across applications in RL, cloud automation, and 6G networks.
Agentic workloads are computational workflows in which autonomous software agents—most prominently those built around LLMs with integrated tool use—independently reason, plan, and take actions through multi-stage, dynamically-structured sequences. These workflows fundamentally differ from both traditional, static data processing pipelines and from isolated LLM inference calls, as they exhibit deep heterogeneity in computational phases, resource demands, control flow, and integration requirements for memory, external tools, and system coordination. Recent research has formalized, optimized, and benchmarked agentic workloads in diverse domains, including data engineering, cloud governance, RL, large-scale orchestration, on-device inferencing, and resource-managed LLM clusters.
1. Defining Agentic Workloads: Structure and Properties
Agentic workloads instantiate as end-to-end, multi-step computational tasks decomposed into sequences (or DAGs) of sub-requests—such as LLM calls, tool invocations, data fetches, and validation steps—where the execution plan may be dynamically determined at runtime. Each agent, typically an LLM augmented with tool-calling and planning APIs, autonomously orchestrates workflow progression, performing context-dependent reasoning and acting on observations from previous steps (Pagonas et al., 15 Oct 2025, Tagliabue et al., 10 Oct 2025, Asgar et al., 25 Jul 2025, Giurgiu et al., 10 Dec 2025).
Formally, a canonical agentic workload can be described as a call graph , where nodes represent discrete stages (LLM calls, tool uses), edges capture data or control dependencies, and per-node metadata includes service time , memory requirements , and selectivity (fraction of requests traversing the node) (Pagonas et al., 15 Oct 2025). This paradigm generalizes across batch and streaming data pipelines; sensor/compute/control fusion (6G networks); RL with environmental rollouts; and multi-agent federations.
Key properties:
- Autonomy: Agents decide which actions or transformations to execute at runtime.
- Dynamic control flow: Topology may evolve, with new downstream steps instantiated based on intermediate results.
- Tool-centric execution: Agents invoke external APIs, database queries, and environment simulation channels.
- Multi-modality and federation: Agents coordinate across operational domains (text, vision, control) and share intermediate results via caches, message buses, or semantic indices (Giurgiu et al., 10 Dec 2025).
This structure distinguishes agentic workloads from static, batch DAGs (classic ETL), from single-shot LLM inference, and from simple tool-assisted AI automation.
2. Resource Management, Scheduling, and Infrastructure
Agentic workloads require non-trivial resource orchestration across heterogeneous compute and memory environments. Traditional monolithic serving architectures are inadequate for delivering predictable performance and SLO guarantees under the bursty, phase-shifting, and highly parallel patterns characteristic of agentic workflows. Recent frameworks address these challenges through explicit decomposition, dynamic scheduling, and resource specialization:
- Stage Isolation and Resource Pooling: Cortex provisions dedicated engine pools per workflow stage (e.g., one per unique LLM prompt or tool type), enforcing stage isolation in both compute and KV-cache memory. This isolation enables precise autoscaling and queue-management based on real-time metrics, and leads to sharper KV reuse and reduced cross-stage interference (Pagonas et al., 15 Oct 2025). Resource partitioning is guided by observed stage arrival rates , service times , and cost optimization over (total GPUs/CPUs).
- Program-Aware Scheduling: In ThunderAgent, agentic workflows are treated as persistent “LLM programs” with explicit state (context length, execution phase, resource affiliation) (Kang et al., 14 Feb 2026). Programs can be paused and restored across nodes; eviction and migration minimize recomputation cost (quadratically scaling in KV-cache size). A global queue enables workload balancing and coordinated cache management to maximize throughput.
- Just-in-Time Model Routing: The Aragog system decouples static accuracy routing from dynamic, per-stage cost-based scheduling, adapting model selection for each workflow stage as a function of up-to-the-moment GPU queue occupancy and model availability (Dai et al., 26 Nov 2025). This enables maximized throughput and SLO compliance under varying system load.
- Micro-batching and Mixed Scheduling: On CPU/GPU hybrid systems, as in (Raj et al., 1 Nov 2025), techniques such as CPU/GPU-aware micro-batching (CGAM) and multi-queue, heterogeneity-aware scheduling (MAWS) are necessary to prevent resource bottlenecks, given that tool executions may dominate total latency (up to 90.6%) and CPU energy consumption (up to 44%).
These resource strategies are unified by explicit mathematical models (M/M/k queues, DAG-based cost functions, utilization metrics) and are empirically validated at scale, e.g., in clusters orchestrating tens of thousands of concurrent agent–environment interactions (Zhang et al., 12 Jan 2026).
3. Optimization, Caching, and Performance Engineering
Agentic workloads stress both compute and memory hierarchies, especially in long-context LLM inference and multi-agent batch serving. Key optimization pathways include:
- Hierarchical and Semantic Caching: Cortex and similar systems instantiate multi-tiered cache topologies—per-engine GPU DRAM (KV-cache), per-pool shared RAM for fast stage-local reuse, and distributed cluster-wide memory for cross-request artifact reuse (Pagonas et al., 15 Oct 2025). Semantic cache tagging and priority-aware eviction policies are shown to restore hit rates (to ≈90%) and reduce thrashing in agentic workloads exhibiting large amounts of prompt or context reuse (Biswas et al., 19 Jan 2026). Micro-caching of semantic fragments among federated agents further accelerates multi-agent collaboration and reduces redundant computation (Giurgiu et al., 10 Dec 2025).
- Batch Query Optimization: Halo represents the agentic batch serving problem as a consolidated DAG, merging overlapping operators across queries, enabling adaptive batching, and maximizing GPU utilization through cache migration and compute–communication overlap (Shen et al., 2 Sep 2025). These optimizations yield speedups up to in batch inference and in online serving, with strict preservation of semantic correctness.
- Hardware–Software Co-Design: Workloads characterized by massive contexts and memory walls (e.g., LLM runs on chat histories or tool trace logs) are addressed through custom hardware, mixed-precision quantization, and systolic array architectures (e.g., PLENA), achieving 2.24 throughput of A100 GPUs (Wu et al., 11 Sep 2025).
- Phase-Aware, Adaptive Resource Usage: Fine-tuned orchestration accounts for the phase-specific dependency of throughput on context length, current batch structure, and I/O patterns (prefill vs. decode).
Performance evaluations across diverse agentic microbenchmarks consistently report reductions in end-to-end tail latency (10–30%), decreases in SLO violations (up to 75% reduction), and significant improvements in throughput compared to static and naive baselines.
4. Safe-by-Design and Governance in Agentic Data Pipelines
Agentic workloads present substantial challenges for correctness, reproducibility, and operational safety, especially in data transformation and governance applications. Research in agentic lakehouses and cloud engineering systems introduces foundational concepts and abstractions:
- API-First, Branch-and-Merge Flows: All agentic changes are mediated via programmable APIs, executed in copy-on-write branches with later atomic merges contingent on formal verification (e.g., proof-carrying code) (Tagliabue et al., 10 Oct 2025). This approach hardens data infrastructure against untrusted or faulty agents, ensuring composability, reproducibility, and reversibility of actions.
- Bounded AI Agents Under Policy and Compliance Constraints: Operational decision-making is modeled as bounded agents , constrained by a set of declarative policies (Kirubakaran et al., 24 Dec 2025). All agent actions undergo explicit validation functions (e.g., cost, compliance) before execution. Performance results demonstrate 45% reductions in recovery time and over 70% decreases in manual interventions, with strict policy compliance.
- Continuous Verification and Human-in-the-Loop Controls: Agentic changes require machine-checked proofs, and operations invoked by agents are observable, auditable, and reversible through API logs, provenance tracking, and explicit human review or approval steps (Tagliabue et al., 10 Oct 2025).
5. Application Domains and Representative Deployment Patterns
Agentic workloads are validated in a broad range of system contexts:
- Reinforcement Learning and Multi-Agent Training: Distributed platforms such as MegaFlow and RollArc orchestrate agent–environment rollouts across thousands of GPUs, using disaggregation, hardware affinity, and asynchrony to maximize throughput for multi-task RL and code generation agents. RollArc, for example, splits compute-heavy and bandwidth-bound workloads across best-fit accelerators and offloads stateless reward computation to serverless environments, achieving 1.35–2.05 training speedups (Gao et al., 27 Dec 2025, Zhang et al., 12 Jan 2026).
- 6G and Real-Time Autonomy: Agentic AI-RAN architectures integrate sensing, communication, computation, and control (SC3) in a closed-loop over networks of UAVs, allocating edge compute resources to ensure guaranteed sub-1 s latencies under stringent reliability constraints (Sun et al., 23 Jan 2026).
- Cloud and Organizational Process Automation: Agentic AI is transforming manual, coordination-heavy workflows into orchestrator/agent architectures. Transition frameworks emphasize domain-driven decomposition, MCP-style agent APIs, and modular, iterative deployment patterns for scalable automation in business processes (Bandara et al., 27 Jan 2026).
- Energy and Cost Optimization: Hybrid Edge Cloud (HEC) deployment of agentic workloads, especially in IoT and real-time autonomous systems, substantially reduces both per-device and aggregate energy/cost footprints (up to 75–80% reductions versus centralized cloud) (Alamouti, 21 Jan 2025).
6. Security, Lifecycle Management, and Open Research Challenges
Securing agentic workloads requires novel defense-in-depth approaches given their autonomy and adaptive behaviors. The MAAIS framework and the CIAA (Confidentiality, Integrity, Availability, Accountability) model augment classical security triads with detailed auditability and provenance metrics, mapping each defense layer to industry-standard attack taxonomies such as MITRE ATLAS (Arora et al., 19 Dec 2025). Defense-in-depth recommendations span:
- Hardware roots of trust and secure enclaves
- API confinement and sandboxed execution
- Provenance and immutable audit logs
- Automated anomaly detection and adversarial robustness
Empirically, multilayered controls demonstrated end-to-end coverage (100% mapping) of adversarial tactics, reducing mean time-to-detect from 24h to 45min and resilience (integrity) scores from 0.85 to 0.98 under simulated attacks.
Key open challenges remain in cost modeling for multi-modal agentic workflows, cache consistency and privacy in federated caches, adaptive control under non-stationary conditions, and the development of unified, behaviorally-responsive data and compute infrastructures tailored to agentic execution (Giurgiu et al., 10 Dec 2025).
References:
- Cortex, Stage Isolation, and Agentic Scheduling: (Pagonas et al., 15 Oct 2025)
- Proof-Carrying Agents in Agentic Lakehouse: (Tagliabue et al., 10 Oct 2025)
- Policy-Bounded Agentic Control in Data Engineering: (Kirubakaran et al., 24 Dec 2025)
- ThunderAgent, Program-Aware Scheduling: (Kang et al., 14 Feb 2026)
- Sutradhara, Orchestration/Engine Co-design: (Biswas et al., 19 Jan 2026)
- Batch Query Optimization (Halo): (Shen et al., 2 Sep 2025)
- MegaFlow and RollArc, Distributed RL: (Zhang et al., 12 Jan 2026, Gao et al., 27 Dec 2025)
- CPU-Centric Characterization: (Raj et al., 1 Nov 2025)
- Agentic AI-RAN for 6G SC3: (Sun et al., 23 Jan 2026)
- Edge-Cloud Energy/Cost Analysis: (Alamouti, 21 Jan 2025)
- Security Frameworks (MAAIS, CIAA): (Arora et al., 19 Dec 2025)
- Agentic Workflow Scaling on Heterogeneous Systems: (Asgar et al., 25 Jul 2025)
- Internet of Agentic AI, Distributed Teaming: (Yang et al., 3 Feb 2026)
- Agentic AI Transition in Organizations: (Bandara et al., 27 Jan 2026)