Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization

Published 28 Apr 2026 in cs.AI and cs.AR | (2604.25083v1)

Abstract: Rapid advances in LLMs create new opportunities by enabling efficient exploration of broad, complex design spaces. This is particularly valuable in computer architecture, where performance depends on microarchitectural designs and policies drawn from vast combinatorial spaces. We introduce Agentic Architect, an agentic AI framework for computer architecture design exploration and optimization that combines LLM-driven code evolution with cycle-accurate simulation. The human architect specifies the optimization target, seed design, scoring function, simulator interface, and benchmark split, while the LLM explores implementations within these constraints. Across cache replacement, data prefetching, and branch prediction, Agentic Architect matches or exceeds state-of-the-art designs. Our best evolved cache replacement design achieves a 1.062x geomean IPC speedup over LRU, 0.6% over Mockingjay (1.056x). Our evolved branch predictor achieves a 1.100x geomean IPC speedup over Bimodal, 1.5% over its Hashed Perceptron seed (1.085x). Finally, our evolved prefetcher achieves a 1.76x geomean IPC speedup over no prefetching, 17% over its VA/AMPM Lite seed (1.59x) and 21% over SMS (1.55x). Our analysis surfaces several findings about agentic AI-driven microarchitecture design. Across evolved designs, components often correspond to known techniques; the novelty lies in how they are coordinated. The architect's role is shifting, but the human remains central. Seed quality bounds what search can achieve: evolution can refine and extend an existing mechanism, but cannot compensate for a weak foundation. Likewise, objectives, constraints, and prompt guidance affect reliability and generalization. Overall, Agentic Architect is the first end-to-end open-source framework for agentic AI architecture exploration and optimization.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper presents the Agentic Architect framework that leverages LLM-driven evolution to systematically optimize microarchitectural policies.
It employs automated code mutation, cycle-accurate simulation, and prompt engineering to improve cache replacement, prefetching, and branch prediction.
The framework reduces manual design effort while achieving measurable IPC gains and enhanced performance benchmark results.

Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization

Introduction and Motivation

Agentic Architect operationalizes LLM-driven, agentic AI for microarchitecture design space exploration, specifically targeting the domains of last-level cache (LLC) replacement, hardware prefetching, and branch prediction. The motivation is the vast, combinatorial design spaces of CPU microarchitectural policies, which have historically required years of specialized manual effort to explore, and have effectively reached diminishing returns through traditional human-centered methodologies. By leveraging recent advances in LLM-based, agentic evolutionary frameworks, Agentic Architect enables systematic, high-throughput exploration and refinement of microarchitectural policies via automated code mutation, simulation-driven evaluation, and iterative feedback incorporating high-level architectural objectives.

Figure 1: An overview of the Agentic Architect framework, indicating human-specified inputs and the LLM-driven evolution loop.

Agentic Architect’s structured loop decouples human expertise from repeated implementation labor. The architect specifies the system prompt, seed policy, evaluator function, and trace database; the LLM acts as a mutation operator, generating candidate policies that are vetted via compilation gating and cycle-accurate simulation, followed by informed scoring and selection. The framework is architecturally modular, supporting domain-independent interfaces, interchangeable evolutionary back-ends (notably OpenEvolve and AdaEvolve), and compatibility with any simulator exposing hook points for policy optimization.

Framework Components and Design Principles

Agentic Architect’s evolutionary agent iteratively mutates candidate policy code using an LLM, passing only code that compiles and completes within a bounded simulation deadline to the evaluation phase. The agent is completely domain-agnostic, relying on the prompt and seed policy to instantiate the search space for the target microarchitectural component. This modularity supports a broad range of use cases, adapting seamlessly to different policy code structures and hardware constraints.

The selection of seed policy, scoring function, trace set, and prompt strategy are human-controlled and essential determinants of the design envelope. The framework encodes explicit, rather than implicit, architectural intent in the scoring function, enabling transparent multi-objective optimization (e.g., end-to-end IPC constrained by domain-specific penalties such as LLC misses or MPKI). Prompt engineering is empirically shown to have a nontrivial effect on solution quality and diversity, with minimal prompts—those that avoid naming legacy techniques or algorithmic recipes—yielding more diverse and effective exploration than heavily prescriptive ones.

Evaluation in Microarchitecture: Experimental Scope and Strong Results

Agentic Architect is instantiated in three domains: LLC replacement, hardware prefetching, and branch prediction. Seeds are drawn from state-of-the-art designs; for cache replacement, Mockingjay (re-use distance prediction); for prefetching, VA/AMPM Lite (compact access map); for branch prediction, Hashed Perceptron. Metrics are end-to-end: geomean IPC speedup over a relevant baseline (e.g., LRU or Bimodal), as measured using ChampSim and a diverse suite of SPEC CPU benchmarks.

Figure 3: Geomean IPC speedup for (left) cache replacement, (center) prefetching, and (right) branch prediction—comparing baseline, SOTA human designs, and Agentic Architect-evolved policies.

Key quantitative results:

Cache Replacement: Evolved design achieves 1.062 $\times$ geomean IPC over LRU, outperforming Mockingjay (1.056 $\times$ ) by 0.6%.
Prefetching: Evolved prefetcher yields 1.76 $\times$ over no prefetching—a 17% improvement over the VA/AMPM Lite seed and 21% over SMS, the strongest published reference.
Branch Prediction: Evolved predictor delivers 1.100 $\times$ over Bimodal, a 1.5% improvement over Hashed Perceptron; on gobmk, MPKI is reduced by 39%, yielding a 14.6% IPC boost.
Figure 2: Per-trace breakdown of IPC speedup for cache replacement, highlighting gains on memory-intensive traces and parity on compute-bound ones.

Figure 4: Per-trace IPC speedup for prefetching; Agentic Architect dominates across diverse access patterns.

Figure 5: Branch prediction per-trace performance, with dramatic improvements on highly sensitive workloads.

Agentic Architect’s evolved cache policies synthesize multiple known techniques (ensemble of RDP, PC-transition, set-context predictors; regret tracking; adaptive aging/thrashing detection) with novel runtime arbitration, outperforming baselines specifically on memory-bottlenecked traces. For prefetching, the evolved architecture integrates six distinct engines (stream, correlation, stride, delta, spatial, global), with epoch-based self-tuning and MSHR-aware throttling—mechanisms rarely found together in prior designs.

Analysis of Evolutionary Effectiveness and Human Factors

Figure 6: Comparative performance of OpenEvolve and AdaEvolve across domains; both evolutionary strategies are performant, but the coupling of LLM mutation and simulation is the main driver.

Prompt strategy has a measurable effect on solution quality and search efficiency. Minimal prompts elicit more innovative combinations and avoid performance-degrading complexity. Naming specific legacy algorithms in the prompt increased the frequency of expensive or unscalable logic, harming both search productivity and overall architectural diversity.

Figure 9: Minimal vs. full prompt comparison for branch prediction: minimal prompts converge faster to higher evaluator scores, discovering more best-in-run candidates.

Model comparison experiments (Opus, Kimi K2.5, Codex, Gemini 2.5) indicate that more advanced LLMs yield superior evolved policies and higher code validity, albeit at significantly increased cost per iteration. Compilation gating (rejecting non-compiling mutants) and simulation timeouts (blocking high-latency logic) are critical for ensuring productive search.

Generalization is domain- and trace-dependent. Prefetching evolved from only three traces generalized to the full benchmark set with no degradation. In contrast, cache replacement and branch prediction gains are clustered on specific, sensitive traces. Effective trace selection and diversity are necessary to avoid overfitting and ensure broad applicability.

Figure 7: Training vs. held-out trace geomean IPC for evolved designs; strong generalization in prefetching, some concentration of gains in replacement and branch prediction.

Structural Patterns in Evolved Architectures

A consistent ensemble pattern emerges in evolved policies:

Preservation of the seed’s core mechanism.
Augmentation with orthogonal predictive features (e.g., adding context predictors, additional engines, local–global history tables).
Novel runtime coordination of diverse features (e.g., dynamic arbitration, phase detection, disabling engines).
Phase-adaptive behavior, adjusting internal policy depending on workload or observed runtime metrics.
Figure 8: Architecture growth from seed to evolved design—illustrating code and storage complexity scaling across domains.

While component algorithms are largely drawn from the literature (e.g., predictors, prefetching engines, state tracking), the particular combinations, arbitration, and adaptivity mechanisms are novel, reflecting synthesis unreachable by manual design due to combinatorial search constraints.

Evolved policies’ complexity (code size and storage cost) can be substantially higher than seeds or state-of-the-art references. In prefetching, the evolved policy is smaller and more performant than the best published design (Berti); replacement and branch prediction trade off area for incremental performance. Explicit incorporation of area, energy, or hardware cost constraints into the fitness function is indicated for resource-bound deployments.

Figure 13: Storage–performance Pareto frontier reveals where evolved designs offer best-in-class performance at a given storage cost.

Human Agent’s Role and Remaining Limitations

The extent of improvement is capped by the quality of the seed and the human’s design of the scoring function and trace sampling. Evolutionary search refines, but does not fundamentally surpass, the best foundation provided. The codified expertise resides in:

Seed policy selection: constrains maximum attainable performance;
Scoring function balancing: governs optimization path and generalization;
Trace set curation: ensures robustness and prevents workload-specific overfitting;
Prompt engineering: enables or restricts the expressive power of the LLM evolutionary agent.

The overarching requirement for domain expertise pivots from low-level policy implementation to high-level envelope specification.

Implications and Outlook

Agentic Architect demonstrates that LLM-driven agentic search, when connected to cycle-accurate evaluation and domain-specialist steering, can systematically exceed the execution efficiency of current SOTA microarchitectural components. This approach is applicable beyond memory hierarchy to any architectural component expressible as evaluable code, including coherence, NoC routing, OS co-design, or accelerator dataflow.

The practical implications are immediate: rapid exploration of new architectural mechanisms, synthesis of multi-component adaptive policies, and scalable benchmarking against SOTA. Theoretically, Agentic Architect closes the gap between program synthesis and systems design, operationalizing agentic co-design as a repeatable workflow. Future development includes co-evolution of interacting policy modules, explicit hardware-aware multi-objective optimization (storage, power), and meta-evolution—agentic agents optimizing architectural search strategies themselves.

Conclusion

Agentic Architect establishes an extensible, open-source paradigm for agentic AI-driven architecture design, realizing state-of-the-art or superior policies across several microarchitecture domains, notably exceeding prior cache and prefetching benchmarks and offering measurable gains in highly-constrained branch prediction spaces. The architectural architect’s role becomes strategic, focusing on optimal envelope specification for agentic evolution. As LLMs and agentic optimization frameworks advance, such co-design platforms are positioned to become integral in future computer architecture research and hardware/software co-design methodologies.

(2604.25083)

Markdown Report Issue