Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation (2407.01910v2)

Published 2 Jul 2024 in cs.LG, cs.AI, and cs.AR

Abstract: LLMs have recently shown promise in streamlining hardware design processes by encapsulating vast amounts of domain-specific data. In addition, they allow users to interact with the design processes through natural language instructions, thus making hardware design more accessible to developers. However, effectively leveraging LLMs in hardware design necessitates providing domain-specific data during inference (e.g., through in-context learning), fine-tuning, or pre-training. Unfortunately, existing publicly available hardware datasets are often limited in size, complexity, or detail, which hinders the effectiveness of LLMs in hardware design tasks. To address this issue, we first propose a set of criteria for creating high-quality hardware datasets that can effectively enhance LLM-assisted hardware design. Based on these criteria, we propose a Multi-Grained-Verilog (MG-Verilog) dataset, which encompasses descriptions at various levels of detail and corresponding code samples. To benefit the broader hardware design community, we have developed an open-source infrastructure that facilitates easy access, integration, and extension of the dataset to meet specific project needs. Furthermore, to fully exploit the potential of the MG-Verilog dataset, which varies in complexity and detail, we introduce a balanced fine-tuning scheme. This scheme serves as a unique use case to leverage the diverse levels of detail provided by the dataset. Extensive experiments demonstrate that the proposed dataset and fine-tuning scheme consistently improve the performance of LLMs in hardware design tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. J. Blocklove et al., “Chip-chat: Challenges and opportunities in conversational hardware design,” arXiv preprint arXiv:2305.13243, 2023.
  2. K. Chang et al., “Chipgpt: How far are we from natural language hardware design,” arXiv preprint arXiv:2305.14019, 2023.
  3. T. Dettmers et al., “Qlora: Efficient finetuning of quantized llms,” arXiv, 2023.
  4. Y. Fu et al., “Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models,” arXiv preprint arXiv:2309.10730, 2023.
  5. Z. He et al., “Chateda: A large language model powered autonomous agent for eda,” arXiv preprint arXiv:2308.10204, 2023.
  6. huggingface, “Datasets,” https://huggingface.co/docs/datasets/en/index, (Accessed on 04/01/2024).
  7. M. Liu et al., “Chipnemo: Domain-adapted llms for chip design,” arXiv preprint arXiv:2311.00176, 2023.
  8. M. Liu et al., “Verilogeval: Evaluating large language models for verilog code generation,” arXiv preprint arXiv:2309.07544, 2023.
  9. Y. Lu et al., “Rtllm: An open-source benchmark for design rtl generation with large language model,” arXiv preprint arXiv:2308.05345, 2023.
  10. OpenAI, “Gpt-3.5,” https://platform.openai.com/docs/models/gpt-3-5, (Accessed on 04/10/2023).
  11. OpenAI, “Gpt-4 technical report,” 2023.
  12. S. Paria et al., “Divas: An llm-based end-to-end framework for soc security analysis and policy-based protection,” arXiv preprint arXiv:2308.06932, 2023.
  13. P. Srikumar, “Fast and wrong: The case for formally specifying hardware with llms,” ASPLOS Workshop, 2023.
  14. S. Takamaeda-Yamazaki, “Pyverilog: A python-based hardware design processing toolkit for verilog hdl,” in Applied Reconfigurable Computing, ser. Lecture Notes in Computer Science, vol. 9040.   Springer International Publishing, Apr 2015, pp. 451–460. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-16214-0_42
  15. S. Thakur et al., “Benchmarking large language models for automated verilog rtl code generation,” arXiv preprint arXiv:2212.11140, 2022.
  16. S. Thakur et al., “Verigen: A large language model for verilog code generation,” arXiv preprint arXiv:2308.00708, 2023.
  17. Z. Yan et al., “On the viability of using llms for sw/hw co-design: An example in designing cim dnn accelerators,” arXiv preprint arXiv:2306.06923, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yongan Zhang (24 papers)
  2. Zhongzhi Yu (25 papers)
  3. Yonggan Fu (49 papers)
  4. Cheng Wan (48 papers)
  5. Yingyan Celine Lin (19 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.