AMS-IO-Bench: Automated I/O Benchmark

Updated 1 January 2026

AMS-IO-Bench is a benchmark suite for automated design of wirebond-packaged AMS I/O rings, offering realistic and reproducible evaluation.
It quantifies design correctness using metrics such as DRC, LVS, and evaluates efficiency via turnaround time and LLM token usage.
The suite supports single- and dual-row pad rings in 28 nm CMOS, enabling structured, automated IC layout generation and signoff validation.

AMS-IO-Bench is a publicly released benchmark suite for the automated design of wirebond-packaged analog and mixed-signal (AMS) integrated circuit input/output (I/O) rings. Designed to provide a realistic and reproducible evaluation environment for structure-aware I/O ring assembly, it quantifies functional and physical correctness, adaptability across variable design specifications, and efficiency metrics such as design turnaround time and LLM token usage. The benchmark specifically targets single- and dual-row peripheral I/O rings for chips conforming to industry-standard foundry and packaging constraints within 28 nm CMOS technology nodes (Zhang et al., 25 Dec 2025).

1. Definition, Goals, and Scope

AMS-IO-Bench addresses the end-to-end task of transforming human-specified pin-planning tables—including signal names, power domain assignments, pad order, and routing hints—into production-grade AMS schematics and layouts suitable for foundry signoff. Signoff entails rigorous checks for both design-rule compliance (DRC) and layout-versus-schematic (LVS) equivalence. The benchmark is engineered with three primary objectives: to accurately reflect the complexity inherent in practical AMS I/O ring design, to offer quantitative metrics on design correctness and adaptability, and to evaluate agent- and LLM-driven automation in terms of solution validity and resource efficiency.

The scope is centered on wirebond-packaged chips with single- or dual-row peripheral pad rings, constrained by foundry-specific rules (pad pitch, outline perimeter, keepout margins) and packaging limitations.

2. Dataset Construction and Organization

AMS-IO-Bench is synthesized from 10 real AMS IC tape-out projects spanning five years, simplified and extended into 30 benchmark cases while retaining critical design constraints. The suite is partitioned by difficulty:

Simple (10 cases): Single power domain, 16–24 pads, straightforward pad sequencing.
Medium (10 cases): 1 mm × 1 mm outline, multi-domain I/O ring (32–48 pads), mixed analog/digital domains, mandatory isolation and filler cell insertion.
Hard (10 cases): Dual-row or staggered pad layouts, 1.5×–2× baseline chip outlines, up to 80 pads, custom low-ESD-capacitance pads, localized ESD power delivery, and highly specialized domain partitioning.

Each case is defined by a structured pad list in JSON format, specifying pad names, intended signal types, power domains, preferred chip sides, and optional routing hints. A reference gold-standard layout is supplied for each test, allowing quantitative shape-score comparison against agent-generated outputs.

3. Formal Metrics and Evaluation Protocol

The AMS-IO-Bench formal evaluation defines the following core metrics for $N=30$ cases:

Intent Graph Pass Rate ( $P_{IG}$ ): $(N_{IG} / N) \times 100\%$ , where $N_{IG}$ is the number of cases for which the agent generates a structurally valid JSON intent graph (valid pad names, device types, completed attributes).
Shape Pass Rate ( $P_{shape}$ ): Computed by a Vision-LLM that verifies the agent's pad-ring layout versus the gold-standard reference.
DRC Pass Rate ( $P_{DRC}$ ): $(N_{DRC} / N) \times 100\%$ , counting cases passing all foundry DRC rules.
LVS Pass Rate ( $P_{LVS}$ ): $(N_{LVS} / N) \times 100\%$ , for cases wherein the layout matches the electrical schematic.
Combined DRC+LVS Pass Rate ( $P_{DRC+LVS}$ ): $(N_{DRC \land LVS} / N) \times 100\%$ .

Design turnaround time for each case ( $T_i$ ) is aggregated as $T_{avg} = \frac{1}{N} \sum_{i=1}^N T_i$ , and LLM token usage per case is tracked as a secondary measure of computational efficiency.

4. End-to-End Benchmarking Workflow

Benchmark execution requires:

Python 3.9+ with the smolagents framework
Backbone LLM API access (e.g., GPT-4o, Claude-3.7, DeepSeek-V3)
Cadence Virtuoso with SKILL scripting for schematic and layout generation
Siemens Calibre for automated DRC and LVS verification via csh scripting

The procedure consists of:

Preparation of intent-graph JSON files detailing pad specifications.
Invocation of AMS-IO-Agent through the Python API, supplying the domain knowledge base.
Agent generation of a completed intent graph with inferred attributes such as pad directionality and device variants.
Parsing and geometric rule resolution by the Intent Graph Adaptor, including computation of pad coordinates and cell instantiation via SKILL scripts for Virtuoso.
GDSII layout and schematic extraction in Virtuoso.
Calibre-based DRC and LVS checks, with parsing of report files (DRC.LST, LVS.LST) for pass/fail metrics.
Runtime and LLM token usage recording through the Python harness.

5. Empirical Results

The performance evaluation includes baseline human, vanilla LLM, and AMS-IO-Agent (across three backbone LLMs):

Method	$P_{DRC+LVS}$ (%)	$T_{avg}$ (min)	Tokens (k)
Human baseline	100	$\approx$ 480	—
Vanilla LLM (GPT-4o direct)	0	0.2	1
AMS-IO-Agent (GPT-4o)	63.3	4.1	160
AMS-IO-Agent (Claude-3.7)	76.7	4.2	96
AMS-IO-Agent (DeepSeek-V3)	76.7	5.1	105

Difficulty breakdown (DeepSeek-V3):

Difficulty	Pass/Total
Simple	10/10
Medium	10/10
Hard	3/10

On AMS-IO-Bench, AMS-IO-Agent demonstrates a $>$ 70% signoff success rate with average turnaround times in the range of several minutes ( $T_{avg}$ ) and substantial reduction in human labor, relative to the 8-hour baseline for expert manual layout.

6. Limitations and Prospective Extensions

AMS-IO-Bench's current implementation is restricted to wirebond packaging and 28 nm foundry rules. Adaptation to alternative packaging modalities (e.g., flip-chip bumps, wafer-level CSP) or diverse technology nodes necessitates modification of the domain knowledge base and geometric rule sets. The present domain knowledge base comprises a curated 6 k-token corpus tailored to established design conventions; extension to support nonstandard pad libraries or foundry-specific requirements may require manual augmentation. While over 70% of benchmark cases achieve signoff-quality fully automated by agent, expert review remains recommended for atypical or highly specialized design tasks.

Planned future work includes extending AMS-IO-Bench to testbench generation, advanced routing operations, benchmarking for multi-step routing, and integration of analog performance metrics post-placement to reflect broader AMS IC design validation scenarios.

7. Context and Significance

AMS-IO-Bench constitutes the first freely available, realistic benchmark for automated wirebond-packaged AMS I/O ring design that links structured natural language design intent to actionable IC deliverables via LLM agent workflows. It provides direct evaluation and scaling for domain-specialized agents in human-agent collaborative IC design, representing a significant methodological advance in benchmarking LLM automation within the AMS physical design domain (Zhang et al., 25 Dec 2025). The empirical tape-out of agent-generated outputs in 28 nm silicon establishes practical feasibility for industrial design flows, supporting reproducible research across algorithmic AMS IC layout generation and structured reasoning-based automation. This suggests AMS-IO-Bench is positioned as a foundational resource for advancing the state of automated mixed-signal physical design and agent-driven workflow validation.

PDF Markdown Chat (Pro)

References (1)

AMS-IO-Bench and AMS-IO-Agent: Benchmarking and Structured Reasoning for Analog and Mixed-Signal Integrated Circuit Input/Output Design (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to AMS-IO-Bench.