AMS-IO-Bench: Automated I/O Benchmark
- AMS-IO-Bench is a benchmark suite for automated design of wirebond-packaged AMS I/O rings, offering realistic and reproducible evaluation.
- It quantifies design correctness using metrics such as DRC, LVS, and evaluates efficiency via turnaround time and LLM token usage.
- The suite supports single- and dual-row pad rings in 28 nm CMOS, enabling structured, automated IC layout generation and signoff validation.
AMS-IO-Bench is a publicly released benchmark suite for the automated design of wirebond-packaged analog and mixed-signal (AMS) integrated circuit input/output (I/O) rings. Designed to provide a realistic and reproducible evaluation environment for structure-aware I/O ring assembly, it quantifies functional and physical correctness, adaptability across variable design specifications, and efficiency metrics such as design turnaround time and LLM token usage. The benchmark specifically targets single- and dual-row peripheral I/O rings for chips conforming to industry-standard foundry and packaging constraints within 28 nm CMOS technology nodes (Zhang et al., 25 Dec 2025).
1. Definition, Goals, and Scope
AMS-IO-Bench addresses the end-to-end task of transforming human-specified pin-planning tables—including signal names, power domain assignments, pad order, and routing hints—into production-grade AMS schematics and layouts suitable for foundry signoff. Signoff entails rigorous checks for both design-rule compliance (DRC) and layout-versus-schematic (LVS) equivalence. The benchmark is engineered with three primary objectives: to accurately reflect the complexity inherent in practical AMS I/O ring design, to offer quantitative metrics on design correctness and adaptability, and to evaluate agent- and LLM-driven automation in terms of solution validity and resource efficiency.
The scope is centered on wirebond-packaged chips with single- or dual-row peripheral pad rings, constrained by foundry-specific rules (pad pitch, outline perimeter, keepout margins) and packaging limitations.
2. Dataset Construction and Organization
AMS-IO-Bench is synthesized from 10 real AMS IC tape-out projects spanning five years, simplified and extended into 30 benchmark cases while retaining critical design constraints. The suite is partitioned by difficulty:
- Simple (10 cases): Single power domain, 16–24 pads, straightforward pad sequencing.
- Medium (10 cases): 1 mm × 1 mm outline, multi-domain I/O ring (32–48 pads), mixed analog/digital domains, mandatory isolation and filler cell insertion.
- Hard (10 cases): Dual-row or staggered pad layouts, 1.5×–2× baseline chip outlines, up to 80 pads, custom low-ESD-capacitance pads, localized ESD power delivery, and highly specialized domain partitioning.
Each case is defined by a structured pad list in JSON format, specifying pad names, intended signal types, power domains, preferred chip sides, and optional routing hints. A reference gold-standard layout is supplied for each test, allowing quantitative shape-score comparison against agent-generated outputs.
3. Formal Metrics and Evaluation Protocol
The AMS-IO-Bench formal evaluation defines the following core metrics for cases:
- Intent Graph Pass Rate (): , where is the number of cases for which the agent generates a structurally valid JSON intent graph (valid pad names, device types, completed attributes).
- Shape Pass Rate (): Computed by a Vision-LLM that verifies the agent's pad-ring layout versus the gold-standard reference.
- DRC Pass Rate (): , counting cases passing all foundry DRC rules.
- LVS Pass Rate (): , for cases wherein the layout matches the electrical schematic.
- Combined DRC+LVS Pass Rate (): .
Design turnaround time for each case () is aggregated as , and LLM token usage per case is tracked as a secondary measure of computational efficiency.
4. End-to-End Benchmarking Workflow
Benchmark execution requires:
- Python 3.9+ with the smolagents framework
- Backbone LLM API access (e.g., GPT-4o, Claude-3.7, DeepSeek-V3)
- Cadence Virtuoso with SKILL scripting for schematic and layout generation
- Siemens Calibre for automated DRC and LVS verification via csh scripting
The procedure consists of:
- Preparation of intent-graph JSON files detailing pad specifications.
- Invocation of AMS-IO-Agent through the Python API, supplying the domain knowledge base.
- Agent generation of a completed intent graph with inferred attributes such as pad directionality and device variants.
- Parsing and geometric rule resolution by the Intent Graph Adaptor, including computation of pad coordinates and cell instantiation via SKILL scripts for Virtuoso.
- GDSII layout and schematic extraction in Virtuoso.
- Calibre-based DRC and LVS checks, with parsing of report files (DRC.LST, LVS.LST) for pass/fail metrics.
- Runtime and LLM token usage recording through the Python harness.
5. Empirical Results
The performance evaluation includes baseline human, vanilla LLM, and AMS-IO-Agent (across three backbone LLMs):
| Method | (%) | (min) | Tokens (k) |
|---|---|---|---|
| Human baseline | 100 | 480 | — |
| Vanilla LLM (GPT-4o direct) | 0 | 0.2 | 1 |
| AMS-IO-Agent (GPT-4o) | 63.3 | 4.1 | 160 |
| AMS-IO-Agent (Claude-3.7) | 76.7 | 4.2 | 96 |
| AMS-IO-Agent (DeepSeek-V3) | 76.7 | 5.1 | 105 |
Difficulty breakdown (DeepSeek-V3):
| Difficulty | Pass/Total |
|---|---|
| Simple | 10/10 |
| Medium | 10/10 |
| Hard | 3/10 |
On AMS-IO-Bench, AMS-IO-Agent demonstrates a 70% signoff success rate with average turnaround times in the range of several minutes () and substantial reduction in human labor, relative to the 8-hour baseline for expert manual layout.
6. Limitations and Prospective Extensions
AMS-IO-Bench's current implementation is restricted to wirebond packaging and 28 nm foundry rules. Adaptation to alternative packaging modalities (e.g., flip-chip bumps, wafer-level CSP) or diverse technology nodes necessitates modification of the domain knowledge base and geometric rule sets. The present domain knowledge base comprises a curated 6 k-token corpus tailored to established design conventions; extension to support nonstandard pad libraries or foundry-specific requirements may require manual augmentation. While over 70% of benchmark cases achieve signoff-quality fully automated by agent, expert review remains recommended for atypical or highly specialized design tasks.
Planned future work includes extending AMS-IO-Bench to testbench generation, advanced routing operations, benchmarking for multi-step routing, and integration of analog performance metrics post-placement to reflect broader AMS IC design validation scenarios.
7. Context and Significance
AMS-IO-Bench constitutes the first freely available, realistic benchmark for automated wirebond-packaged AMS I/O ring design that links structured natural language design intent to actionable IC deliverables via LLM agent workflows. It provides direct evaluation and scaling for domain-specialized agents in human-agent collaborative IC design, representing a significant methodological advance in benchmarking LLM automation within the AMS physical design domain (Zhang et al., 25 Dec 2025). The empirical tape-out of agent-generated outputs in 28 nm silicon establishes practical feasibility for industrial design flows, supporting reproducible research across algorithmic AMS IC layout generation and structured reasoning-based automation. This suggests AMS-IO-Bench is positioned as a foundational resource for advancing the state of automated mixed-signal physical design and agent-driven workflow validation.