ZeroSim: Transformer Analog Modeling

Updated 17 November 2025

ZeroSim is a transformer-based analog circuit performance modeling framework that provides zero-shot prediction of circuit metrics across diverse amplifier topologies.
It employs a hierarchical graph attention encoder with progressive parameter injection and a global topology token to capture both local and global circuit behaviors.
Integrated in RL-based device sizing, ZeroSim drastically accelerates design iterations, achieving up to 13× speedup over traditional SPICE simulations.

ZeroSim is a transformer-based analog circuit performance modeling framework designed for robust, zero-shot prediction of circuit metrics across unseen amplifier topologies, eliminating the need for topology-specific retraining or manual fine-tuning. It addresses the major bottlenecks of circuit evaluation in analog design automation by replacing expensive SPICE simulations with a unified, data-driven surrogate that demonstrates both in-distribution and zero-shot generalization. The core innovation of ZeroSim lies in its architectural integration of hierarchical graph attention, progressive parameter injection, and a globally-conditioned topology embedding, trained on an extensive corpus spanning a wide variety of circuit structures and parameter configurations.

1. Model Architecture

ZeroSim operates by translating a circuit schematic into a pin-level undirected graph $G = (V, E)$ , where each node represents a pin (e.g. drain, gate, source, and bulk of a MOSFET) and edges represent both physical wiring and virtual device-centric connectivity. This graph representation captures both inter-pin and intra-device relationships. A global token $[G]$ is introduced to aggregate and propagate the entire circuit’s context:

$[ G ] \leftrightarrow \{ v_1, v_2, \ldots, v_{|V|} \}$

Initial node embeddings $H^{(0)}$ consist of pin-type and device-type look-ups concatenated with the learnable global token.

The core encoder alternates two modes across $L$ transformer layers:

Structure-Level Refining: Attention is masked ( $M_{\text{local}}$ ) so each pin interacts only with wire-connected and co-device pins:

$\hat{H}^{(l)} = \text{MHA}(H^{(l-1)}, H^{(l-1)}, H^{(l-1)}; M_{\text{local}})$

Context-Level Enhancing: Uses full attention, enabling all pins and $[G]$ to interact without masking:

$\tilde{H}^{(l)} = \text{MHA}(H^{(l-1)}, H^{(l-1)}, H^{(l-1)})$

Parameter tokens $r_{ij}$ representing each device parameter $p_{ij}$ are progressively injected in dedicated layers via device-masked cross-attention:

$h_v^{(l)} = \text{MHA}(h_v^{(l-1)}, \{ r_{ij} \}_j, \{ r_{ij} \}_j; M_{\text{device}})$

This preserves topology-agnostic structural encoding in the lower encoder layers and introduces parameter awareness only after sufficient structure extraction.

A query-based decoder comprises $K$ independent learnable tokens $\{ q_{p1}, ..., q_{pK} \}$ , one per target performance metric. These attend over encoder outputs using two transformer layers, and the outputs are individually mapped to metric predictions $\hat{y}_i$ via a linear head.

2. Enabling Strategies

ZeroSim’s generalization and scalability arise from three principal strategies:

Large-Scale Training Corpus: The framework is trained on 3.6 million circuit instances, representing over 60 amplifier topologies with device counts ranging from 6 to 39. Each topology is parameterized using ranges suitable for the Sky130 PDK: $W \in [0.2, 10]\mu$ m, $L \in [0.13, 1]\mu$ m, $M \in [1, 100]$ , $C \in [1, 100]$ pF, $R \in [0.1, 1000]$ k $\Omega$ , $I_{\text{bias}} \in [1, 40]\mu$ A. For each, 60,000 random parameter sets are simulated, generating ground-truth for 11 key metrics (power, DC gain, GBW, phase margin, slew rate, settling time, CMRR, PSRR+, PSRR–, offset, temp coeff).
Unified Topology Embeddings: The encoder uses a global-aware token and hierarchical attention that alternately restricts and enhances information flow, abstracting both local device/pin interactions and global circuit behaviors. This approach allows the model to embed any pin-level graph derived from a schematic, independent of topology.
Topology-Conditioned Parameter Mapping: By strictly separating structure-only encoding from parameter fusion, ZeroSim ensures that the learned structural representation generalizes across topologies and is not specific to any single parameterization. Device-masked cross-attention maintains locality when injecting parameters, ensuring a consistent mapping for each device regardless of topology.

3. Training Procedure and Evaluation Metrics

ZeroSim is trained using mean absolute percentage error (MAPE) as the primary loss:

$L_{\text{MAPE}} = \frac{1}{K} \sum_{k=1}^{K} \left| y_k - \hat{y}_k \right| / \left| y_k \right|$

Adam optimizer is employed with $lr=5 \times 10^{-4}$ , cosine decay, and gradient clipping. Batch size is 256, and training is conducted for 200 epochs on dual A100 GPUs. Metric normalization is applied per metric using train-set statistics.

Performance is also reported using an accuracy metric, $\text{Acc}@K$ (with $K=10$ ), which computes the fraction of metric errors within $(\max y - \min y) / K$ :

$\text{Acc}@K = \frac{1}{N} \sum_{j=1}^{N} 1\left[ |y_j - \hat{y}_j| \leq \frac{\max_i y_i - \min_i y_i}{K} \right]$

4. Experimental Results and Comparative Analysis

Empirical results demonstrate that ZeroSim achieves superior performance for both in-distribution and zero-shot settings. The following table summarizes key quantitative results (from Table III):

Model	MAPE↓ (Zero-shot)	Acc@10↑ (Zero-shot)
MLP	0.451	0.015
GCN	0.256	0.367
DeepGEN	0.214	0.493
GTN	0.192	0.542
ZeroSim	0.143	0.645

ZeroSim achieves a 33% relative reduction in zero-shot MAPE versus the strongest GNN baseline (DeepGEN: 0.214 $\to$ ZeroSim: 0.143), and a 20% improvement in zero-shot $\text{Acc}@10$ (0.542 $\to$ 0.645) over graph transformer models.

Limitations of prior work are evident: MLPs perform poorly due to lack of structure modeling; GCNs and DeepGEN improve on local feature aggregation but struggle with novel topology generalization; GTN attention over graph nodes enhances encoding for long-range dependencies but does not fully capture unseen configurations. ZeroSim’s hierarchical encoder and global topology token enable substantive gains in robustness across topologies.

5. Integration in Reinforcement Learning-Based Device Sizing

ZeroSim functions as a surrogate evaluator within RL-based optimization loops, specifically AnalogGym. The standard workflow involves:

At each episode, an RL policy $\pi_{\theta}$ samples a parameter set $x$ for the current topology.
ZeroSim predicts circuit metrics, providing rapid inference for the reward computation: $r \leftarrow \text{FoM}(\hat{y})$ .
The RL agent (PPO/REINFORCE) updates $\pi_{\theta}$ using policy gradient on $r$ .
Periodically, ground-truth is obtained from full SPICE simulation for bias monitoring.

Pseudocode:

initialize policy π_θ
for episode = 1…N do
  sample parameter set x ← π_θ(·)
  ŷ ← ZeroSim.predict(graph, x)
  r ← FoM(ŷ)
  update π_θ via PPO/REINFORCE on r
  occasionally (every M steps):
    y_spice ← SPICE.simulate(graph, x)
    r_spice ← FoM(y_spice)
    log |r − r_spice|
end
final x* validated by SPICE

In evaluation on an unseen 10-device amplifier (NMCF), ZeroSim accelerates RL convergence (FoM $\to$ 0) by $\approx 13\times$ versus SPICE, where FoM is a scalar summary of design metric constraint satisfaction ($0$ is best).

6. Practical Implications and Future Directions

ZeroSim’s approach—comprising a large heterogeneous training set, unified hierarchical transformer graph encoder, global circuit token, and parameter injection—permits scalable and adaptable analog circuit evaluation, significantly reducing the computational cost of design iterations. It enables orders-of-magnitude speedup in RL-based sizing workflows, with generalization guarantees supported by strict separation of topology and parameter representations.

A plausible implication is that such transformer-based surrogates could extend to broader classes of circuit generative and optimization tasks, provided training corpora are sufficiently expressive. Conversely, generalization is constrained by the diversity and abstraction level of graph representations; circuit types with fundamentally new structural motifs may require retraining or encoder architecture refinement.

ZeroSim represents a current state-of-the-art model for circuit metric surrogates, outperforming MLP, GNN, and graph transformer baselines in both accuracy and adaptability for analog amplifier topologies. Its integration within RL design automation loops provides substantial efficiency improvements over traditional simulation-driven optimization, and its generalizing architecture suggests applicability to new circuits without manual substructure engineering or fine-tuning.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to ZeroSim.