Papers
Topics
Authors
Recent
2000 character limit reached

Agentic Kernel Generation

Updated 2 January 2026
  • Agentic kernel generation is an automated approach that employs autonomous LLM agents in iterative loops to generate, validate, and optimize computational kernels.
  • It leverages multi-modal feedback—including static analysis, JIT profiling, and empirical tests—to ensure functional correctness and performance improvement.
  • These systems support heterogeneous hardware and DSLs, achieving significant speedups and broad operator coverage through adaptive, closed-loop workflows.

Agentic kernel generation refers to the automated synthesis, validation, and optimization of computational kernels or system-level primitives using agentic workflows—primarily those driven by LLMs operating in iterative, feedback-driven, and often multi-agent loops. These systems transform kernel enablement from a static human-engineered activity to an adaptive, scalable, and context-sensitive process, targeting diverse hardware and software stacks from AI accelerators to OS subsystems (Hammond et al., 3 Dec 2025, Wang et al., 31 Jul 2025, Zhang et al., 23 Oct 2025, Liao et al., 29 Dec 2025, Du et al., 29 Dec 2025, Dong et al., 19 Oct 2025, Zhang et al., 19 Nov 2025, Zheng et al., 1 Sep 2025).

1. Core Concepts and Definitions

Agentic kernel generation systems are distinguished by their use of autonomous agents—typically implemented via LLMs—which iteratively generate, assess, and refine kernel implementations. Unlike naive prompt-based or one-shot code generation, these systems use closed feedback loops incorporating both programmatic and empirical checks:

  • Agentic loop: The central workflow, frequently modeled as a finite-state machine (FSM) or as a search (tree or graph) traversed by cooperative agents. Stages include code generation, static analysis, compilation, hardware execution, and feedback extraction.
  • Coverage orientation: Prioritization of correct functional coverage across large kernel/operator sets, supporting all data types, signature patterns, and argument shapes (Hammond et al., 3 Dec 2025).
  • Multi-modal feedback: Integration of static program checks (linting, AST analysis), dynamic runtime profiling (JIT, hardware counters), empirical correctness (test harnesses), and knowledge retrieval from documentation or historical experience (Zhang et al., 23 Oct 2025, Liao et al., 29 Dec 2025).
  • Iterative refinement: Use of LLM-based agents or subagents specialized for code synthesis, error diagnosis, optimization suggestion, or plan decomposition, operating in feedback loops inspired by human engineering workflows.
  • Heterogeneous and cross-platform support: Compatibility with multiple hardware backends (e.g., NVIDIA, AMD, Meta MTIA, NPUs, CPUs) and diverse kernel DSLs (Triton, CUDA, CuTe, TileLang) (Liao et al., 29 Dec 2025, Du et al., 29 Dec 2025).

2. Architectures and Methodological Taxonomy

Several design archetypes for agentic kernel generation have converged in recent work:

System Architecture Agents / Submodules Hardware / DSLs
TritorX (Hammond et al., 3 Dec 2025) FSM per operator LLM generator, Linter, Compiler, Test harness, Log Summarizer Meta MTIA, Triton
GEAK (Wang et al., 31 Jul 2025) Multi-agent pipeline Generator, Evaluator, Reflector, Optimizer AMD MI300X, Triton
CudaForge (Zhang et al., 23 Oct 2025) Two-agent (Coder, Judge) Correction, Optimization (hardware feedback) CUDA, NVIDIA GPUs
KernelEvolve (Liao et al., 29 Dec 2025) Graph search (universal operator) Node selection, Universal operator, Eval, Retriever NVIDIA, AMD, MTIA. Triton, CuTe, MLIR
AKG (Du et al., 29 Dec 2025) Closed-loop, modular Designer, Coder, Verifier, Conductor Triton, CUDA-C, TileLang, CPP
STARK (Dong et al., 19 Oct 2025) Tree search, multi-agent Search controller, Plan agent, Code agent, Debug/Profiler CUDA
AccelOpt (Zhang et al., 19 Nov 2025) Beam-search loop Planner, Executor, Summarizer, Memory AWS Trainium/NKI
SchedCP (Zheng et al., 1 Sep 2025) Multi-agent, decoupled OS Observation, Planning, Execution, Learning Linux eBPF, Schedulers

Most implementations structure kernel generation as an iterative process: (1) candidate generation via an LLM (often context-conditioned), (2) static or dynamic formal verification, (3) JIT compilation or hardware execution, and (4) response-driven prompt or memory updates. Architectures range from explicit FSMs (Hammond et al., 3 Dec 2025), beam or tree search (Zhang et al., 19 Nov 2025, Dong et al., 19 Oct 2025, Liao et al., 29 Dec 2025), to multi-agent modular systems (Du et al., 29 Dec 2025, Wang et al., 31 Jul 2025, Zhang et al., 23 Oct 2025).

3. Formal Decision Criteria and Optimization Objectives

Agentic kernel generation systems formalize correctness and fitness criteria as binary and continuous objectives grounded in hardware-realized execution:

  • Lint and static correctness: Candidate passes if all linter rules yield zero violations:

lint_ok=rR(rule_r_violations=0)\text{lint\_ok} = \bigwedge_{r\in\mathcal R} (\text{rule\_r\_violations} = 0)

  • Functional correctness: Operator passes if outputs match a canonical backend within ϵ\epsilon across all relevant test inputs:

Pop,t={1if devcpu<ϵ 0otherwiseP_{op,t} = \begin{cases} 1 & \text{if } |\text{dev} - \text{cpu}| < \epsilon \ 0 & \text{otherwise} \end{cases}

  • Coverage: Fraction of operators or benchmarks with complete pass rates; Sop=1S_{\text{op}} = 1 indicates full correctness.
  • Performance objectives: Speedup relative to reference implementation, e.g., TritorX’s fitness:

F(v)=tpytorchttriton\mathcal{F}(v) = \frac{t_{\text{pytorch}}}{t_{\text{triton}}}

  • Termination: Completion upon reaching target coverage, improvement stall, or artifact budget exhaustion:

τ(Gt)=(VtNmax)(v:F(v)F)(stall_countM)\tau(G_t) = (|V_t| \geq N_{\text{max}}) \vee (\exists v: \mathcal{F}(v) \geq F^*) \vee (\text{stall\_count} \geq M)

Agent selection and expansion often use softmax, ϵ\epsilon-greedy, or Monte Carlo Tree Search policies over observed fitness or coverage scores (Liao et al., 29 Dec 2025, Dong et al., 19 Oct 2025). In evaluation, systems report metrics such as median speedup, percent exceeding baseline, pass@K, and per-operator correctness.

4. Feedback Mechanisms and Context Management

Effective agentic kernel pipelines depend on multi-level and multi-modal feedback, including:

Context management strategies include prompt truncation, focused tokenization (e.g., bottleneck-extracted artifacts only), and dynamic context windows specific to each agent’s role (planning, coding, debugging) (Dong et al., 19 Oct 2025).

5. Empirical Evaluation and Benchmark Results

Scalable agentic kernel systems report the following empirical capabilities:

6. Key Design Trade-offs and Future Directions

Possible directions include reinforcement or active learning for plan selection, embedding-based retrieval for optimization memory, multi-platform joint search, and coordinated agentic optimization across kernel, OS, and system stack subsystems (Liao et al., 29 Dec 2025, Zheng et al., 1 Sep 2025, Zhang et al., 19 Nov 2025).

7. Broader Context and Philosophical Underpinnings

The agentic kernel generation paradigm signifies a transition from monolithic, single-shot code generation toward open-ended, self-adaptive, and robust code synthesis systems, moving beyond traditional AI-hardware co-design cycles (Hammond et al., 3 Dec 2025, Liao et al., 29 Dec 2025, Du et al., 29 Dec 2025). The term "agentic kernel" also echoes research in cognitive architectures, where a minimal "functional kernel" enables autonomous emergence of higher-level cognitive functions through reflexive, schema-based self-organization (Serov, 2022). This analogy underscores the trajectory of future agentic kernel platforms: to provide the substrate from which both routine and emergent computation can be self-organized and optimized—potentially closing the last-mile gap in hardware–software co-evolution.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Agentic Kernel Generation.