Papers
Topics
Authors
Recent
Search
2000 character limit reached

TcadGPT: Domain-Specific TCAD LLM

Updated 22 January 2026
  • TcadGPT is a domain-specific large language model designed for high-accuracy TCAD simulation code generation under low-resource conditions.
  • It implements a schema-first alignment and IR→DPO workflow to ensure both syntactic and semantic correctness in tool-compilable code.
  • Empirical results demonstrate TcadGPT’s superior QA accuracy and SDE script pass rates, with a reproducible framework that generalizes across scientific domains.

TcadGPT is a compact, domain-specialized LLM explicitly aligned for technology computer-aided design (TCAD) simulation tasks, distinguished by robust executability under low-resource conditions. Developed using a schema-first alignment paradigm, TcadGPT incorporates large-scale synthetic QA data, a code-centric IR→DPO optimization pipeline, and optionally retrieval-augmented generation (RAG) to deliver high semantic accuracy and SDE (Sentaurus Device Editor) script pass rates. The framework demonstrates state-of-the-art performance and generalizes to other scientific domains such as finite element analysis, providing a reproducible solution to tool-compilable code generation in data-scarce verticals (Wang et al., 15 Jan 2026).

1. System Architecture and Alignment Methodology

TcadGPT is instantiated with an LLaMA 3.1 backbone (8B parameters), fine-tuned via full-parameter optimization using LLaMA-Factory and DeepSpeed (Stage 3) on multi-GPU setups. Model inputs accommodate sequences up to 4,096 tokens (bf16 precision). The model’s distinguishing feature is its explicit schema-first alignment workflow:

  • Schema-first alignment: A domain-specific intermediate representation (IR) is defined a priori and directly introduced into the tuning loop via Direct Preference Optimization (DPO), encouraging conformity to schema-valid outputs.
  • DPO integration: A lightweight DPO head scores candidate continuations sθ(x,π)s_{\theta}(x,\,\pi), with a pairwise logistic loss that refines output preference toward syntax-valid (i.e., tool-acceptable) code:

LDPO(θ)=E(x,π+,π)[log(esθ(x,π+)esθ(x,π+)+esθ(x,π))]\mathcal{L}_{\mathrm{DPO}}(\theta) = - \mathbb{E}_{(x,\,\pi^+,\,\pi^-)} \Bigl[\log\Bigl(\frac{e^{s_\theta(x,\,\pi^+)}}{e^{s_\theta(x,\,\pi^+)} + e^{s_\theta(x,\,\pi^-)}}\Bigr)\Bigr]

Here xx is the user instruction, π+\pi^+ is the preferred code continuation from IR rendering, and π\pi^- is a controlled violation (single error).

A plausible implication is that schema-first DPO-head integration, without altering the model’s main architecture, provides robust control over both syntactic and semantic alignment in specialized scientific applications (Wang et al., 15 Jan 2026).

2. Synthetic Data Generation and Diversity

The construction of TcadGPT leverages a dual-pipeline approach for synthetic QA sample generation:

  • Pipeline 1 (Segment-Based): User guides, textbooks, and training manuals are parsed into sections, for which segment-based question–answer pairs are generated and aggressively paraphrased (tenfold diversification per prompt). Output: 340,000 Alpaca-format samples.
  • Pipeline 2 (Keyword-Guided): For each document segment, keywords are extracted and targeted questions are generated, yielding 1.2M highly granular QA pairs emphasizing command and parameter coverage.
  • Diversity controls: Paraphrasing, keyword-guided expansion, duplication avoidance, enforced JSON output, and formula/code inclusion mandates enhance sample variety and coverage.

Table: Synthetic QA Data Composition

Source Pipeline 1 (%) Pipeline 2 (%) Total (%)
User Guides 30 50 80
Textbooks 10 5 15
Training Docs 5 0 5

This approach enables foundational concept coverage and command-level syntax familiarity necessary for effective downstream alignment and code executability (Wang et al., 15 Jan 2026).

3. IR→DPO Workflow and Code Executability Optimization

Central to TcadGPT’s code synthesis competence is the IR→DPO training recipe:

  • IR extraction: Each verified SDE script is parsed into an IR capturing schema primitives such as dimensionality, up-direction, materials, geometry (with ordered Boolean operations), contacts, doping, mesh directives, and export commands.
  • Equivalence-preserving diversification: Each IR instance is expanded via systematic transformations:
    • Numeric jitter (a=a(1+δ),δU(ϵ,ϵ)a' = a\cdot (1+\delta),\, \delta \sim U(-\epsilon, \epsilon))
    • Boolean operation order commutation (when order is semantics-invariant)
    • Presence/absence toggling for mesh and export directives
    • Standardization of synonymous constructs (e.g., aliasing "global" and "all_regions")
  • DPO preference pair construction: For each instruction/code pair, variants are synthesized with single semantic errors (e.g., numeric scale errors, Boolean-op misordering, missing export) and validated via deterministic checkers.

A plausible implication is that this code-centric, IR-driven contrastive framework not only increases the dataset by an order of magnitude but confers fine-grained control over what constitutes syntactic and semantic correctness according to strict SDE compiler specifications (Wang et al., 15 Jan 2026).

4. Retrieval-Augmented Generation and Controlled Evaluation

Retrieval-augmented generation (RAG) is optionally incorporated via embedding domain documentation using nomic-embed-text in a Chroma DB. At inference, the model retrieves document chunks relevant to the user query, which are appended in a structured prompt.

Empirical studies reveal:

  • RAG substantially improves general-purpose models (LLaMA +33.4 percentage points, DeepSeek V3 +17 points on a 264-QA TCAD benchmark), but
  • RAG degrades performance of TcadGPT (–21.6 points), attributed to overfitting to superficial document shorthands (e.g., equation indices) rather than engaging in schema-aligned reasoning.

Table: QA Accuracy With/Without RAG

Model –RAG (%) +RAG (%)
LLaMA 3.1 8B 24.6 58.0
DeepSeek V3 50.0 67.0
TcadGPT 85.6 64.0

This suggests that TcadGPT’s schema alignment renders external retrieval unnecessary or detrimental for task compliance in highly constrained domains (Wang et al., 15 Jan 2026).

5. Empirical Metrics and Comparative Evaluation

TcadGPT’s performance is established through a suite of empirically rigorous metrics:

  • QA Semantic Accuracy (TCAD 264 benchmark): 85.6% for TcadGPT (vs. 46.6% for GPT-4o, 50.0% DeepSeek V3). Subdomain analysis yields 93.9% for general physical models, 86.1% for simulation, and 84–94% for SDE/SDevice/SProcess.
  • SDE Syntax Pass Rate: On 20 held-out SDE instructions, Pass@1 is 65% (13/20), and Pass@3 is 80% (16/20) for TcadGPT; DeepSeek V3 achieves 0%. Passes by direct compilation without placeholder substitution are distinguished from those requiring template filling.
  • Comparative Executability Table
Model Pass@1 (%) Pass@3 (%)
TcadGPT 65.0 80.0
DeepSeek V3 0.0 0.0

These metrics provide evidence of both high conceptual understanding and robust SDE script generation under strict executability constraints (Wang et al., 15 Jan 2026).

6. Generalization and Portability

TcadGPT’s schema-first alignment and IR→DPO pipeline generalize to structurally similar scientific domains. Applying the same recipe to the Elmer open-source FEM solver, comparable gains are observed:

  • QA (Elmer, 100 questions): Elmer-adapted 8B model achieves 52% vs. DeepSeek V3.2 at 34% and GPT-4o at 32%.
  • Pass@1 (Elmer code, 20 instructions): Elmer-8B+DPO: 12/20 (60%); DeepSeek V3.2: 7/20; GPT-4o: 4/20.

This suggests that the framework’s IR→DPO methodology is domain-agnostic provided a well-defined intermediate schema and documentation base exist (Wang et al., 15 Jan 2026).

7. Reproducibility, Artifacts, and Broader Significance

All resources—synthetic QA datasets, IR→DPO modules, evaluation benchmarks, and model checkpoints—are released publicly (https://github.com/wddddds1/TcadGPT), facilitating reproducibility and further domain extensions. Released assets include:

  • 1.5M synthetic QA samples (340K Pipeline 1; 1.2M Pipeline 2)
  • 264-question TCAD QA benchmark, 20-instruction SDE test set
  • Elmer QA and code test sets
  • Modular scripts for QA generation, IR extraction/diversification, and DPO pair construction

The overall framework indicates a reproducible, scalable path for constructing executable, schema-controlled LLMs in computational science domains with substantial data scarcity and strict compilability constraints (Wang et al., 15 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TcadGPT.