Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis (2110.11292v1)

Published 21 Oct 2021 in cs.LG, cs.AI, cs.SY, and eess.SY

Abstract: Logic synthesis is a challenging and widely-researched combinatorial optimization problem during integrated circuit (IC) design. It transforms a high-level description of hardware in a programming language like Verilog into an optimized digital circuit netlist, a network of interconnected Boolean logic gates, that implements the function. Spurred by the success of ML in solving combinatorial and graph problems in other domains, there is growing interest in the design of ML-guided logic synthesis tools. Yet, there are no standard datasets or prototypical learning tasks defined for this problem domain. Here, we describe OpenABC-D,a large-scale, labeled dataset produced by synthesizing open source designs with a leading open-source logic synthesis tool and illustrate its use in developing, evaluating and benchmarking ML-guided logic synthesis. OpenABC-D has intermediate and final outputs in the form of 870,000 And-Inverter-Graphs (AIGs) produced from 1500 synthesis runs plus labels such as the optimized node counts, and de-lay. We define a generic learning problem on this dataset and benchmark existing solutions for it. The codes related to dataset creation and benchmark models are available athttps://github.com/NYU-MLDA/OpenABC.git. The dataset generated is available athttps://archive.nyu.edu/handle/2451/63311

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Animesh Basak Chowdhury (15 papers)
  2. Benjamin Tan (42 papers)
  3. Ramesh Karri (92 papers)
  4. Siddharth Garg (99 papers)
Citations (29)

Summary

Overview of OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

The paper discusses the development and significance of OpenABC-D, a comprehensive dataset tailored for Machine Learning (ML) applications in the field of logic synthesis during Integrated Circuit (IC) design. The dataset encompasses 870,000 And-Inverter-Graphs (AIGs) derived from 1500 synthesis recipes applied across 29 open-source hardware intellectual properties (IPs) and presents itself as a pivotal resource for the evaluation and benchmarking of ML frameworks designed for logic synthesis optimization.

Logic Synthesis as a Combinatorial Optimization Challenge

Logic synthesis is the initial and crucial step in converting a high-level IC design into an optimized gate-level netlist, a transformation pivotal for meeting the constraints of area, power, and delay in an IC layout. Due to the 22-Hard complexity inherent in this task, it is typically handled via heuristics to simplify and minimize logic gate configurations effectively. OpenABC-D positions itself as an essential benchmark dataset addressing this complexity, aiming to assist in the evaluation of ML methods proposed for guiding logic synthesis steps and strategies.

Motivation and Dataset Characteristics

The impetus for creating OpenABC-D lies in the absence of a standard, large-scale dataset for ML-driven EDA tasks. Prior works on ML-guided logic synthesis often relied on disparate and limited datasets, impeding consistent benchmarking. OpenABC-D fills this gap, providing a unified dataset that reflects real-world IC design challenges and enables comparison of ML approaches. It captures a wide range of synthesis scenarios, with individually labeled data samples representing varying synthesis levels from initial to final outputs across different hardware IPs.

Data Generation and Framework

The authors outlined a robust open-source workflow to generate the dataset. Utilizing open-source tools like Yosys and ABC for the synthesis runs, the framework converts IPs described in HDLs (such as Verilog) into AIG structures, subsequently processed into data samples compatible with ML models. The dataset was generated over extensive compute hours, highlighting its richness and potential to model synthesis tasks accurately.

ML Tasks and Benchmarking

A highlight of the paper is the exploration of leveraging graph convolution networks (GCNs) to predict the Quality of Result (QoR) of synthesis recipes. The dataset facilitates tasks such as synthesizing IPs with unseen recipes and predicting performance characteristics of synthesis recipes. The results demonstrate the efficacy of GCNs in capturing and predicting complex interactions within synthesis flows, indicating substantial potential for improving synthesis recipes through informed ML models.

Implications and Future Directions

This work paves the way for a standardized approach to evaluating ML models in the EDA domain. It opens avenues for further research, particularly in developing more nuanced models that can generalize across diverse IC designs. Future work might focus on extending OpenABC-D by incorporating more industrial-scale IPs and exploring domain adaptation techniques capable of predicting design characteristics across different technology nodes and design paradigms.

OpenABC-D is a step forward in enhancing collaboration between EDA and AI communities, potentially speeding up design cycles, and reducing the computational cost of IC synthesis. By making the dataset and related code publicly accessible, the authors contribute to fostering reproducible research and encouraging future advancements in ML applications within logic synthesis.