Papers
Topics
Authors
Recent
Search
2000 character limit reached

AgentCgroup: AI Agent Resource Control

Updated 4 July 2026
  • AgentCgroup is a resource governance construct that integrates an AI agent’s identity, goals, and budgets into enforceable OS-level controls.
  • It leverages native mechanisms like Linux cgroups v2 and Windows Job Objects to manage CPU, memory, I/O, and process limits while separating reasoning from execution.
  • Quantitative evaluations demonstrate improved throughput and reduced memory waste by applying adaptive, fine-grained enforcement at tool-call boundaries.

AgentCgroup denotes a family of closely related resource-governance constructs for AI agents. In the "Agent Operating Systems (AOS)" architecture, it is the primitive that binds an agent’s identity, goals, and budgets to concrete, enforceable OS resource controls while preserving the separation between reasoning, execution, and policy and guaranteeing determinism at the enforcement boundary; in Linux-specific work, it is an eBPF-powered, in-kernel controller that aligns cgroup governance with tool-call boundaries; and in programmable agentic serving, it appears as a controller-managed grouping capability for dynamically controlling communication and resource behavior at runtime (Sharma et al., 1 Jun 2026, Zheng et al., 10 Feb 2026, Agarwal et al., 6 Jan 2026).

1. Definition and architectural position

Within AOS, agents are treated as first-class entities with agent identity, capability sets, context state, and execution records. AgentCgroup is the execution container beneath that abstraction. It receives scheduler outputs—specifically “bounded reasoning slices” and “execution slices”—as resource intents and enforces them deterministically; separates reasoning-plane resources from execution-plane resources so that unbounded inference cannot starve tool execution or vice versa; provides explicit memory and context bounds for ephemeral context, durable memory, retrieved knowledge, and execution records; anchors least-privilege sandboxes for tool mediation; and exposes observability surfaces that can be correlated with AOS action identifiers to produce decision lineage and audit (Sharma et al., 1 Jun 2026).

The AOS formulation also defines the assumptions that constrain AgentCgroup. Reasoning is probabilistic and potentially adversarial, whereas enforcement must be deterministic and auditable. AOS “does not replace kernels,” does not require inference in kernel space, and does not discard classical abstractions. AgentCgroup is therefore a control-plane construct over OS primitives rather than a replacement for kernel mediation of hardware. This is codified by the paper’s invariants, including “No side-effecting action is executed without a deterministic policy decision of allow,” “Scheduling decisions depend only on observable state and budgets, not on internal reasoning tokens,” and “AOS can tolerate nondeterminism inside reasoning. It cannot tolerate nondeterminism in enforcement” (Sharma et al., 1 Jun 2026).

A narrower but more concrete Linux realization appears in "AgentCgroup: Understanding and Controlling OS Resources of AI Agents," which defines AgentCgroup as an in-kernel resource controller executing inside Linux cgroup v2 enforcement points and aligning governance with the individual tool call as the natural unit of work for coding agents (Zheng et al., 10 Feb 2026). A third formulation, from software-defined agentic serving, uses the term to describe a capability for grouping agents and tools under intent-driven, telemetry-aware communication and serving policies (Agarwal et al., 6 Jan 2026). Taken together, these formulations place AgentCgroup at the boundary between probabilistic agent logic and deterministic systems enforcement.

2. Workload characteristics and the rationale for AgentCgroup

The primary rationale for AgentCgroup is that agent workloads violate the assumptions behind classical processes, threads, files, permissions, and resource schedulers. Traditional abstractions assume bounded execution, stable principals, and observability centered on resource usage rather than decision lineage. Agent workloads are long-lived, opportunistically active, dynamically scoped in capability, and governed by goal progress rather than instruction progress (Sharma et al., 1 Jun 2026).

The Linux characterization study on sandboxed coding agents provides the quantitative basis for this claim. Across 144 SWE-rebench tasks and two LLMs, OS-level execution—container and agent initialization plus tool calls—accounts for 56–74% of end-to-end task latency, while LLM reasoning accounts for 26–44%. Memory, not CPU, is the concurrency bottleneck: average CPU usage is 13.2% of one core for Haiku and 7.6% for GLM, yet peak memory reaches 2–4 GB per agent, capping concurrency at roughly 32–64 instances on a 128 GB host. Memory spikes are tool-call-driven, exhibit a two-layer structure consisting of an approximately 185–200 MB framework baseline plus bursts to 500 MB–2 GB and sometimes around 4 GB, and can reach a peak-to-average ratio of 15.4. Resource demands also vary sharply across tasks, runs, and models: peak memory spans 197 MB to 4 GB, one task exhibited 1.8× run-to-run completion-time variance, and mean CPU usage differed by 1.7× across models (Zheng et al., 10 Feb 2026).

The study formalizes memory burstiness with the peak-to-average ratio

PAR=maxt[0,T]M(t)1T0TM(t)dt,\text{PAR} = \frac{\max_{t \in [0,T]} M(t)}{\frac{1}{T}\int_0^T M(t)\,dt},

and reports the discrete-time equivalent for 1 s sampling. In the cited pydicom case, maxM=4060\max M = 4060 MB, Mˉ=264\bar{M} = 264 MB, and PAR=15.4\text{PAR} = 15.4 (Zheng et al., 10 Feb 2026).

These measurements motivate three mismatches between existing controls and agent workloads. The first is a granularity mismatch: container-level limits do not align with tool-call-level dynamics. The second is a responsiveness mismatch: user-space reaction is too slow for bursts lasting 1–2 s and changing at up to 3 GB/s. The third is an adaptability mismatch: history-based prediction fails for non-deterministic, stateful execution with retries, memory accumulation, and different solution paths across repeated runs (Zheng et al., 10 Feb 2026). In AOS terms, these mismatches explain why “goal progress scheduling” requires an intermediate primitive that can translate agent priorities, budgets, and states into CPU, memory, I/O, and process limits on real machines (Sharma et al., 1 Jun 2026).

A further consequence is resource waste under static reservation. The Linux paper defines waste as

W=1MˉMlimit,W = 1 - \frac{\bar{M}}{M_{\text{limit}}},

and gives the pydicom example

W=126440600.935,W = 1 - \frac{264}{4060} \approx 0.935,

meaning 93.5% of reserved memory is wasted if limits are set to worst-case peak for the duration of a task (Zheng et al., 10 Feb 2026). This supports the view that agent resource control must be adaptive, scoped, and enforcement-oriented rather than purely allocative.

3. Realization across operating systems and control planes

AgentCgroup is deliberately expressed in terms of native OS primitives. In Linux, the AOS paper maps it to a dedicated subtree in the unified cgroups v2 hierarchy for each agent or agent class, with controllers for cpu, memory, io, and pids. The recommended deployment is to place each agent or agent class into dedicated cgroups and control CPU shares, quotas, and priorities. The hierarchy may include per-plane subgroups, such as separate cgroups for reasoning workers and tool executors, thereby preserving least privilege and independent budgets. In systemd-managed deployments, AgentCgroup is realized as systemd-managed cgroup slices and units under the unified v2 controller, with slices representing tenants or autonomy levels and units representing individual agents and subcomponents (Sharma et al., 1 Jun 2026).

In Windows, AgentCgroup maps to one or more Job Objects that group agent processes and enforce CPU rate control, memory caps, I/O prioritization, and termination semantics. Tool executors run under restricted security tokens and, where applicable, AppContainer or Windows containers or Hyper-V isolation. ETW and Windows Event Log provide observability correlated with agent action identifiers from the AOS control plane (Sharma et al., 1 Jun 2026).

The Linux-specific AgentCgroup implementation adds a finer hierarchical structure aligned with tool-call boundaries. It assigns one parent cgroup per agent container and one child cgroup per tool call, so that per-call quotas, throttling, freezing, or killing can be applied and torn down with tool-call lifecycle. It uses cgroup v2 controls such as memory.high, memory.max, cgroup.freeze, cgroup.kill, memory.oom.group, and optionally memory.low to protect high-priority agent cgroups from reclaim pressure (Zheng et al., 10 Feb 2026).

Context Realization Main function
AOS on Linux cgroups v2, namespaces, systemd slices/units Agent-scoped resource intents, plane separation, audit anchoring
AOS on Windows Job Objects, restricted tokens, ETW Process grouping, caps, sandboxing, correlated observability
Linux in-kernel controller Parent agent cgroup plus child tool-call cgroups, eBPF hooks Per-call throttling, freeze/kill, adaptive isolation
Software-defined serving Control plane, metrics plane, configurable data-plane shim Group-level communication and serving control

The serving formulation does not replace OS enforcement, but it extends the notion of AgentCgroup upward into a telemetry-driven control plane. Agents and tools register supported control parameters and APIs, and a logically centralized controller installs agent-level and request-level rules, including communication defaults, admission control, rerouting, speculative blocking, and deeper actions such as pause, throttle, reprioritization, or KV-cache transfer (Agarwal et al., 6 Jan 2026). This suggests that AgentCgroup can designate both an OS resource domain and a higher-level grouping abstraction, provided that enforcement remains tied to concrete system mechanisms.

4. Scheduling, memory semantics, and lifecycle management

AOS defines an agent scheduler that optimizes goal progress subject to cost budgets, risk, and SLAs. AgentCgroup is the mechanism that renders the scheduler’s outputs actionable in OS terms. For CPU, it applies shares, weights, quotas, rate limiting, and nice or priority controls to bounded reasoning slices and execution slices. For memory, it imposes per-cgroup limits to bound context and state and uses kernel memory-pressure signals to trigger deterministic degradation, such as shrinking context windows or pausing lower-priority agents. For I/O, it throttles reads and writes for tool executors while maintaining fairness across agents or tenants. For pids, it bounds process creation and fan-out and enforces deterministic termination semantics when agents are suspended or terminated (Sharma et al., 1 Jun 2026).

The AOS memory model decomposes agent memory into four classes: ephemeral context, durable agent memory, retrieved knowledge with provenance and classification labels, and append-only execution records. AgentCgroup governs the runtime processes associated with these classes and enforces isolation via namespaces and MAC. The key requirement is not deterministic reasoning but deterministic context construction and deterministic enforcement. Retrieval sets and filters must therefore be recorded, while memory pressure policies reduce ephemeral context sizes or pause lower-priority agents in a reproducible manner (Sharma et al., 1 Jun 2026).

The software-defined serving framework describes a parallel control vocabulary in terms of runtime state sts_t and a policy mapping π(st)\pi(s_t) that selects communication and serving parameters to evolve observed state toward operator-specified targets. Its controller polls system-level metrics such as GPU or CPU utilization, memory pressure, and queue lengths, as well as application-level metrics including per-request latency, TTFT, TPT, and agent dependencies. It then applies request-level and agent-level policies through a configurable data-plane shim and a minimal set(parameter name, value) / reset(parameter name) interface exposed by heterogeneous agents and tools (Agarwal et al., 6 Jan 2026). In that environment, AgentCgroup-capable control includes switching between batching and streaming, blocking speculative requests until resources are available, throttling less critical members, assigning priorities and pacing strategies, rerouting requests across instances, and transferring KV-cache state during reroutes (Agarwal et al., 6 Jan 2026).

Lifecycle management is explicit in AOS. When an agent is created, the lifecycle manager assigns stable identifiers and capabilities, and the scheduler provisions AgentCgroup instances for reasoning and execution. On suspend or terminate, the system performs deterministic cleanup: cancel in-flight tool requests, revoke tokens, close sessions, and persist a final audit closure record. The paper does not prescribe a naming scheme, but it requires correlating kernel-level events with AOS action identifiers; practically, AgentCgroup paths and job-object names should carry stable agent identity and task IDs to guarantee audit stitching and operator comprehensibility (Sharma et al., 1 Jun 2026).

5. Security, policy enforcement, and observability

AgentCgroup is also a policy-enforcement anchor. The AOS threat model includes prompt or context manipulation, tool misuse, privilege escalation, data exfiltration, supply-chain compromise, audit evasion, and control-plane compromise. Within this model, AgentCgroup participates in least-privilege execution and capability bounding by limiting what tool executors can do and how much they can consume. On Linux, this includes namespaces for pid, mount, net, and user, seccomp for syscall restriction, and SELinux or AppArmor for MAC; controlled egress forces all side-effecting network calls through brokered gateways. On Windows, restricted tokens drop privileges, ACLs enforce object-level access, application control policies and code integrity constrain execution, and Windows Firewall or WFP together with enterprise proxies mediate network access (Sharma et al., 1 Jun 2026).

The AOS enforcement principles are explicit: “Deny by default for side-effecting tools unless explicitly allowed by capability,” and “All tool calls must flow through mediation. This must be enforced by OS and network controls, not by convention.” Policy layering comprises static authorization, capability constraints, context policy, risk policy, and governance policy, including approvals and dual control. Delegation must be explicit, revocable, and attributed both to the agent and to the delegating principal. Safe failure modes prefer denial of side effects under policy or audit outages (Sharma et al., 1 Jun 2026).

Observability combines resource metrics with decision lineage. AgentCgroup exposes cgroup statistics and procfs views for CPU, memory, and I/O; eBPF and auditd hooks for syscalls, network flows, and scheduler events correlated with AOS action IDs; and, on Windows, ETW and Security Audit logs for process, network, file, and registry events tied to agent operations. Audit invariants require that “All policy outcomes (allow, deny, defer) are recorded in an append-only audit record prior to rescheduling.” Deterministic replay focuses on enforcement pipelines and context reconstruction rather than on reproducing probabilistic reasoning (Sharma et al., 1 Jun 2026).

The Linux eBPF controller makes these enforcement and observability goals more concrete. Its kernel components include memcg_bpf_ops, which compute throttle delays on memory.high breaches; sched_ext, which biases scheduling toward latency-critical calls; and BPF maps that store per-cgroup state such as current RSS, allocation velocity, global policy parameters, and priority lists. A lightweight daemon initializes hierarchies, configures policies via pinned BPF maps, and collects telemetry such as RSS, delay counts, and fork/exec attribution for dashboards. The implementation is constrained by the eBPF verifier—bounded loops, checked pointer arithmetic, limited helper usage—and sched_ext provides fail-safe reversion to the default scheduler on errors (Zheng et al., 10 Feb 2026).

6. Evaluation, integration models, and open problems

AOS describes three integration models for AgentCgroup. As a user-space runtime, AgentCgroup is created and managed via cgroups v2 or Job Objects, while tool mediation is implemented as a sidecar or gateway and OS and network controls force all side effects through it. As an OS extension, stronger enforcement is added through LSM, eBPF, AppContainer, WFP, and audit integrations, while the primitive remains a cgroup or job object. As a distributed control plane, central scheduling and policy maintain agent identity, budgets, and governance, and node-level executors instantiate AgentCgroup on each host and report audit streams (Sharma et al., 1 Jun 2026).

The Linux-specific evaluation provides preliminary quantitative evidence for the in-kernel approach. In a multi-tenant replay of three real agent memory traces at 50× speed, under tight memory—1100 MB total versus about 1233 MB combined demand—the baseline OOM-killed one low-priority process, yielding 66% survival, whereas AgentCgroup’s in-kernel throttling triggered 239 delay events and achieved 100% survival with only +2.8% overhead for the high-priority workload. Under moderate memory—1300 MB total—it reduced high-priority P95 allocation latency by 29%, from 70.97 ms to 50.14 ms, while incurring P50 latency overhead of 0.3% and reducing total completion time by 1.1% (Zheng et al., 10 Feb 2026).

The software-defined serving evaluation addresses a different layer: communication and serving control. There, dynamic communication control improves throughput, controller-driven load balancing with KV-cache transfer support performs 1.8× better than load balancing without hints, the proposed solution performs 2.3× better than a baseline with no load balancing, and overall the architecture improves serving throughput by up to 3.6× via fine-grained control of communication granularity, with deeper serving control delivering another 2.3× improvement (Agarwal et al., 6 Jan 2026). These results do not substitute for OS isolation, but they demonstrate that AgentCgroup-style grouping and intent-driven control can extend above the kernel boundary into agentic serving systems.

Several limitations and open problems remain. The Linux implementation depends on memcg_bpf_ops, which is described as an RFC patch set, and sched_ext, which is upstream but relatively new; portability is therefore kernel-dependent. The characterization is based on one agent framework and SWE-rebench coding tasks, so other agent classes may differ. The prototype focuses on memory and CPU, leaving I/O and disk throttling, page-cache management, and NUMA awareness for future work (Zheng et al., 10 Feb 2026). At the AOS level, open research problems include formal agent scheduling for goal progress and fairness; memory semantics for deterministic context and provenance; verification boundaries for enforcement pipelines; deterministic trust-state transitions; cross-system identity and delegation with revocation; and multi-agent delegation chains with cascading revocation (Sharma et al., 1 Jun 2026). In programmable serving, open issues include declarative languages for agentic control, richer control-agent interfaces, scalable metrics infrastructure, and online policy adaptation for SLO management (Agarwal et al., 6 Jan 2026).

A common misconception is that AgentCgroup implies replacing the operating system or moving inference into the kernel. The AOS formulation explicitly rejects both positions. Another is that CPU scheduling alone captures the dominant contention mode of agent execution. The Linux characterization instead identifies memory as the concurrency bottleneck and tool-call bursts as the decisive source of volatility (Sharma et al., 1 Jun 2026, Zheng et al., 10 Feb 2026). Accordingly, AgentCgroup is best understood not as a single implementation, but as a systems pattern: a deterministic, auditable enforcement boundary that translates agent-level intent into concrete controls over CPU, memory, I/O, process creation, communication behavior, and side effects.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AgentCgroup.