RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects (2405.17378v1)

Published 27 May 2024 in cs.LG and cs.AR

Abstract: LLMs have demonstrated potential in assisting with Register Transfer Level (RTL) design tasks. Nevertheless, there remains to be a significant gap in benchmarks that accurately reflect the complexity of real-world RTL projects. To address this, this paper presents RTL-Repo, a benchmark specifically designed to evaluate LLMs on large-scale RTL design projects. RTL-Repo includes a comprehensive dataset of more than 4000 Verilog code samples extracted from public GitHub repositories, with each sample providing the full context of the corresponding repository. We evaluate several state-of-the-art models on the RTL-Repo benchmark, including GPT-4, GPT-3.5, Starcoder2, alongside Verilog-specific models like VeriGen and RTLCoder, and compare their performance in generating Verilog code for complex projects. The RTL-Repo benchmark provides a valuable resource for the hardware design community to assess and compare LLMs' performance in real-world RTL design scenarios and train LLMs specifically for Verilog code generation in complex, multi-file RTL projects. RTL-Repo is open-source and publicly available on Github.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (15)

Authors (2)

Ahmed Allam (18 papers)
Mohamed Shalan (3 papers)

Citations (5)

View on Semantic Scholar

Tweets

https://twitter.com/WWVY/status/1795379851295162856

RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects (2405.17378v1)

Related Papers

Tweets