Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework (2403.11202v2)

Published 17 Mar 2024 in cs.AR, cs.AI, and cs.PL

Abstract: Recent advances in LLMs have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these LLMs in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by LLMs. Additionally, the absence of a Verilog and Electronic Design Automation (EDA) script data augmentation framework significantly increases the time required to prepare the training dataset for LLM trainers. This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts. For Verilog generation, it translates Verilog files to an abstract syntax tree and then maps nodes to natural language with a predefined template. For Verilog repair, it uses predefined rules to generate the wrong verilog file and then pairs EDA Tool feedback with the right and wrong verilog file. For EDA Script generation, it uses existing LLM(GPT-3.5) to obtain the description of the Script. To evaluate the effectiveness of our data augmentation method, we finetune Llama2-13B and Llama2-7B models using the dataset generated by our augmentation framework. The results demonstrate a significant improvement in the Verilog generation tasks with LLMs. Moreover, the accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark. Our 13B model (ChipGPT-FT) has a pass rate improvement compared with GPT-3.5 in Verilog generation and outperforms in EDA script (i.e., SiliconCompiler) generation with only 200 EDA script data.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (24)

Authors (19)

Kaiyan Chang (10 papers)
Kun Wang (355 papers)
Nan Yang (182 papers)
Ying Wang (366 papers)
Dantong Jin (1 paper)
Wenlong Zhu (2 papers)
Zhirong Chen (6 papers)
Cangyuan Li (5 papers)
Hao Yan (109 papers)
Yunhao Zhou (9 papers)
Zhuoliang Zhao (1 paper)
Yuan Cheng (70 papers)
Yudong Pan (4 papers)
Yiqi Liu (13 papers)
Mengdi Wang (199 papers)
Shengwen Liang (11 papers)
Huawei Li (39 papers)
Xiaowei Li (63 papers)
Yinhe Han (23 papers)

Citations (12)

View on Semantic Scholar

Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework (2403.11202v2)

Related Papers

Tweets