ArchGym: ML-Based Architecture Exploration
- ArchGym is an open-source, extensible framework for ML-driven architectural design space exploration, offering unified APIs and integrated baselines for fair benchmarking.
- It employs an environment–agent loop supporting diverse search strategies like RL, BO, GA, ACO, and random sampling, highlighting the crucial role of hyperparameter tuning.
- By standardizing experiment protocols and incorporating proxy models for simulation acceleration, ArchGym advances reproducibility and efficiency in architectural research.
ArchGym is an open-source, extensible framework designed to enable fair, reproducible, and objective comparison of ML-assisted algorithms for architectural design space exploration. Addressing the complexity stemming from the high dimensionality and combinatorial explosion of modern hardware architecture configuration spaces, ArchGym provides unified APIs and integrated baselines, thereby facilitating ML algorithm selection, hyperparameter tuning, benchmarking, and data collection for downstream research (Krishnan et al., 2023).
1. Motivation and Scope
Design-space exploration for domain-specific architectures—such as memory controllers, deep neural network (DNN) accelerators, and AR/VR system-on-chip (SoC) platforms—involves tuning dozens of discrete and continuous architectural "knobs," quickly leading to design spaces that surpass configurations. Brute-force search is infeasible due to the combinatorial explosion. Traditional performance evaluation using cycle-accurate or register-transfer-level (RTL) simulators is computationally expensive, imposing tight sample budgets for algorithmic search.
Multiple ML-based optimizers have been proposed: reinforcement learning (RL), Bayesian optimization (BO), genetic and ant-colony algorithms, and random baselines. However, the lack of a common experimental environment, detailed hyperparameter treatments, and standardized benchmarks significantly impedes objective algorithm selection and slows research progress (Krishnan et al., 2023).
2. Framework Overview and API Design
ArchGym abstracts each design-space exploration problem as a standard "environment–agent" loop, mirroring the OpenAI Gymnasium interface. The two key entities are:
- Environment: Wraps the target cost model, which can be an analytical model, high-fidelity simulator, ML-driven proxy, or physical hardware, alongside a suite of workloads.
- Agent: Encapsulates the search strategy, parameterized by its own policy representation (e.g., neural networks, genomes, surrogate models, pheromone matrices) and hyperparameters.
Core API methods mirror the traditional RL paradigm:
reset()→ returns initial observationstep(action)→ returns(next_observation, reward, done, info)
Action spaces are flexible, supporting both discrete and continuous dimensions to encode architecture parameters (e.g., buffer_size ∈ {1,2,4,8}, PE_count ∈ [1,256]). Reward is scalar-valued, representing user-defined objectives such as minimizing latency energy area; non-RL agents typically ignore observations but fully utilize rewards as fitness measures.
ArchGym provides plug-in support for new agents (by subclassing and implementing select_action and update) and environments (by extending the environment base class with required API methods). All (observation, action, reward) trajectories are logged into extensible datasets (e.g., TFDS, RLDS), enabling offline RL or proxy modeling.
3. Integrated Search Algorithms and the Hyperparameter Lottery
ArchGym includes five representative agent types:
- Reinforcement Learning (PPO, SAC, DDPG): Employs neural policy and standard RL objectives with mechanisms for exploration.
- Bayesian Optimization (BO): Uses Gaussian Process surrogates, acquisition functions (e.g., UCB, Expected Improvement), and closed-loop optimization.
- Genetic Algorithm (GA): Maintains a population of genomes with explicit exploitation/exploration balance via selection, crossover, and mutation.
- Ant-Colony Optimization (ACO): Utilizes a pheromone matrix, stochastic design sampling, and pheromone decay for adaptive search.
- Random Walker (RW): Baseline agent performing uniform random sampling.
A central empirical finding is the hyperparameter lottery: with unlimited simulator samples and exhaustive hyperparameter sweeps (typically over 4,000 configurations per agent), any agent family can match or surpass others, indicating that performance is determined as much by hyperparameter tuning as by algorithmic class. Under practical (limited) sample budgets ( calls), simpler methods (RW, GA) often match or outperform sophisticated ones (RL, BO), while with larger budgets (), RL's performance rises but other families remain competitive.
Experimentally, the interquartile range (IQR) of final rewards can be extremely broad, reaching 90% in DRAMGym and 40% in FARSIGym solely due to hyperparameter selection, underscoring the importance of transparent hyperparameter reporting and statistical rigor (Krishnan et al., 2023).
4. Experimental Platforms, Metrics, and Results
ArchGym natively implements multiple architecture-specific environments and benchmarks:
- DRAMGym: DRAM controller optimization using DRAMSys; space of configurations.
- TimeloopGym: DNN accelerator optimization via Timeloop; configurations.
- FARSIGym: Custom SoC (AR/VR) using FARSI; 0 configurations.
- MaestroGym: DNN mapping with Maestro; up to 1 mappings per layer.
Reward functions allow for both single-objective and weighted multi-objective design targets. Sample budgets in published experiments spanned 2 to 3 simulator calls, with special attention to unlimited-sample (perfect tuning) scenarios.
Key findings:
- No search algorithm dominates when unlimited samples and perfect tuning are allowed—each can reach user targets at least once.
- Under constrained resources, random or evolutionary approaches compete strongly, while RL typically requires more samples to be effective.
- Diversity of agent strategies and hyperparameter treatments drive performance variance far more than algorithm family alone (Krishnan et al., 2023).
5. Proxy Modeling for Simulation Acceleration
ArchGym's data aggregation allows the construction of ML-based proxy models. In the DRAMGym case study, a Random Forest regressor (trained using 4 data points from varied agents) yielded a normalized RMSE of approximately 0.61% across key targets (latency, dynamic power, energy), achieving 5 speedup over full simulation. Aggregating data from heterogeneous agent runs dramatically reduces proxy error—by up to 6 compared to single-agent datasets of equivalent size (Krishnan et al., 2023).
Proxy models can thus serve as high-throughput surrogates for expensive simulators, enabling more efficient or broader design space exploration.
6. Code Structure, Extensibility, and Integration
ArchGym's modular directory structure is organized as follows:
| Directory | Purpose | Example Files |
|---|---|---|
envs/ |
Simulator environment bindings | DRAMGym.py, TimeloopGym.py |
agents/ |
ML search algorithms | ppo_agent.py, bo_agent.py, ga_agent.py |
datasets/ |
Data logging/export pipelines | loggers.py |
utils/ |
Utility functions | 7 |
Adding a new environment involves subclassing archgym.Env, defining the action and observation spaces, and implementing the standard API. New agents are created by subclassing archgym.Agent and exposing all hyperparameters via YAML or JSON for systematic sweeping.
Every experiment records complete configuration, trajectory, and outcome information, supporting full reproducibility. Tie-ins to Weights & Biases and TensorBoard provide dashboard-style experiment monitoring.
7. Impact and Integration with Related Platforms
By providing a unified interface, detailed dataset export, and baseline implementations, ArchGym enables:
- Objectively fair benchmarking of ML search algorithms across diverse architectural application domains.
- Systematic collection/sharing of design-space datasets, supporting both online agent training and offline proxy modeling.
- Plug-and-play extension for emerging ML optimization algorithms or architectural simulators.
ArchGym's agent-environment abstraction is compatible with software architecture benchmarking frameworks such as ArchBench (Adnan et al., 18 Mar 2026), supporting RL/agent-in-the-loop evaluation where agent outputs are scored using external pipelines or custom metrics.
Together, these features position ArchGym as a cornerstone for reproducible, data-driven research in the application of ML to architecture design (Krishnan et al., 2023).