AnalogSeeker: Analog Circuit Design LLM

Updated 16 August 2025

AnalogSeeker is an open-source language model specifically designed for analog circuit design using a curated corpus and multi-agent QTSA framework.
It employs granular knowledge distillation by decomposing textbook content into exam-style Q–A pairs to enhance training effectiveness.
The model utilizes Neighborhood Self-Constrained Supervised Fine-Tuning to balance domain adaptation with foundational LLM capabilities, achieving 85.04% accuracy on benchmarks.

AnalogSeeker is an open-source foundation LLM developed specifically to address the unique data scarcity, knowledge complexity, and automation requirements of analog circuit design. Built atop a large-scale, high-quality textual corpus curated from canonical analog circuit textbooks, AnalogSeeker employs a multi-agent granular knowledge distillation method and a principled fine-tuning-centric training paradigm, introducing innovations in both training methodology and dataset construction. The model achieves state-of-the-art accuracy on dedicated analog knowledge benchmarks and demonstrates practical downstream utility in complex analog design tasks. AnalogSeeker is freely available for research at https://huggingface.co/analogllm/analogseeker (Chen et al., 14 Aug 2025).

1. Domain-Specific Corpus Collection

AnalogSeeker’s core is rooted in a systematically assembled “textual domain corpus” that comprehensively represents the analog circuit body of knowledge. The framework for corpus collection is explicitly structured into four ascending stages:

Circuit theory: Covers foundational passive network laws, network analysis, and both time and frequency domain analysis.
Analog circuit basis: Encompasses device characteristics (e.g., MOSFET, BJT), fundamental amplifiers, feedback theory, and stability analysis.
Analog integrated circuits: Focuses on concrete modules such as operational amplifiers, comparators, and integrated CMOS design practices.
Advanced circuit topics: Addresses specialized modules such as phase-locked loops (PLLs).

Twenty canonical textbooks spanning at least a dozen key analog circuit types were curated and segmented both by learning stage and circuit class. A commercial OCR and structure extraction pipeline (Mathpix) yielded a Markdown corpus of 7.26 million tokens, hierarchically structured by chapter and subsection, with explicit extraction and anonymization of mathematical expressions and figures.

2. Granular Knowledge Distillation via Multi-Agent Framework

To translate the dense, multifaceted textbook information into a machine-learnable supervision source, AnalogSeeker introduces a “learning node”–driven decomposition. Each subsection of the corpus (typically ~2000 tokens; 2,698 nodes in total) is transformed by a multi-agent framework into datasets of question–answer pairs, explicitly encoding reasoning steps.

The QTSA (Question–Thinking–Solution–Answer) format is employed:

Q(i): Agent Ψ_Q generates an exam-style question directly from the learning node.
T(i) and S(i): Agent Ψ_A, given the node and Q(i), outputs a detailed reasoning process (> …), sequential solution steps (<solution>…</solution>), and
A(i): The answer (<answer>…</answer>).

Each node is sampled Nₛ = 5 times, and standardized by a post-processing agent Ψ_P, ensuring robustness to formatting and over-specific references. This yields a fine-grained, explicit, and high-quality supervised training dataset (112.65M tokens).

3. Model Architecture and Training Paradigm

AnalogSeeker fine-tunes the Qwen2.5-32B-Instruct LLM, adopting a training strategy informed by both corpus size and domain complexity:

Fine-tuning–centric paradigm: Given the modest size of textbook-derived unsupervised corpus, the approach eschews continual pre-training (CPT) in favor of focused supervised fine-tuning (SFT) on the distilled QTSA data. Experiments show <1% absolute improvement from CPT+SFT versus SFT alone, highlighting that classical pre-training is not cost-effective at this corpus scale.
Instruct Model Preference: “Instruct” LLMs (e.g., Qwen2.5-32B-Instruct) proved more robust to further fine-tuning than “reasoning models,” with reward-optimized parameters exhibiting fragility during domain adaptation.
Empirical and theoretical validation: Ablations confirm that SFT is the main contributor to analog circuit knowledge transfer, and that instruct models balance adaptability and capacity retention.

4. Neighborhood Self-Constrained Supervised Fine-Tuning (NSC-SFT)

To ensure effective domain adaptation without catastrophic forgetting of foundational LLM capabilities, AnalogSeeker employs NSC-SFT, which regularizes the fine-tuning trajectory:

Loss formulation:

$L = L_{CE}(y_{\text{predict}}, y_{\text{label}}) + \lambda \cdot D_{KL}(p_{\text{predict}} \,\|\, p_{\text{ref}})$

$L_{CE}$ : cross-entropy loss for standard supervised learning.
$D_{KL}(p_{\text{predict}}\|\ p_{\text{ref}})$ : Kullback–Leibler divergence between the current (fine-tuned) output distribution and the reference (pre-trained) model, over the entire output vocabulary $V$ , at each input $x$ :

$D_{KL}(p_{\theta}(v|x)\|\ p_{\theta_0}(v|x)) = \sum_{v \in V} p_{\theta}(v|x) \cdot \log\frac{p_{\theta}(v|x)}{p_{\theta_0}(v|x)}$
$\lambda$ $λ$ is a tunable hyperparameter.
- Engineering realization: Memory-peak optimization is required for 32B models with long contexts (8192 tokens), accomplished by asymmetric memory management (reference model resident on every GPU, target model distributed with DeepSpeed ZeRO-3) and careful tensor deletion.
- Convergence guarantee: Standard analysis of smooth composite losses guarantees convergence to stationary points for learning rates in $(0,1/L)$ , $L$ being the Lipschitz constant.

5. Performance and Evaluation Benchmarks

AnalogSeeker establishes state-of-the-art performance on AMSBench-TQA, a benchmark designed for textual QA in analog circuit knowledge:

Accuracy: 85.04%, an absolute improvement of 15.67 percentage points over the Qwen2.5-32B-Instruct baseline and outperforming both reasoning LLMs (QwQ-32B, 81.54%) and commercial models (GPT-4o, 73.99%; DeepSeek-v3, 84.41%).

On downstream operational amplifier design within the Atelier framework:

Capabilities: Iterative topology design, topology modification (e.g., introducing series nulling resistors to improve phase margin), and expert-level circuit analysis in natural language.
Trajectory documentation: Full design trajectories detail the agent’s reasoning and decision-making steps, including handling of phase margin and output swing.

6. Open Research Resource and Impact

AnalogSeeker is open-sourced for research use on HuggingFace (https://huggingface.co/analogllm/analogseeker). This public availability:

Facilitates reproducibility by providing access to model weights, QTSA dataset construction protocol, and training recipes.
Enables domain adaptation and integration, e.g., fine-tuning for particular analog subfields or integration with frameworks such as Atelier.
Accelerates research in analog EDA automation, especially in data-constrained domains where classical LLMs lack sufficient prior.

This approach represents a marked advance in the methodological rigor of analog circuit LLM development, coupling fine-grained textbook data, explicit multi-agent knowledge distillation, purposefully constrained fine-tuning, and benchmarked validation.

7. Summary Table: AnalogSeeker Foundation Model

Feature	Description	Quantitative Result/Value
Corpus size	Curated textbook corpus; Markdown, 7.26M tokens	20 books, 2,698 learning nodes
Knowledge distillation format	Multi-agent QTSA (Q–Think–Solution–Answer)	112.65M tokens distilled SFT data
Model base	Qwen2.5-32B-Instruct	32B, 8192-token context
Fine-tuning method	NSC-SFT (adds KL divergence constraint)	$L = L_{CE} + \lambda D_{KL}$
AMSBench-TQA accuracy	Analog circuit QA benchmark	85.04%
Improvement over baseline	Compared to Qwen2.5-32B-Instruct and commercial models	+15.67, +0.63, and +11.05 points
Availability	Open-source release on HuggingFace	https://huggingface.co/analogllm/analogseeker

The AnalogSeeker initiative provides both a rigorous engineering exemplar and a practical, open tool for the analog circuit design research community, demonstrating that domain-specific LLMs can close performance gaps in fields with structural data scarcity and compositional complexity.

PDF Markdown Chat (Pro)

References (1)

AnalogSeeker: An Open-source Foundation Language Model for Analog Circuit Design (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to AnalogSeeker.