HyGCN: A GCN Accelerator with Hybrid Architecture (2001.02514v1)

Published 7 Jan 2020 in cs.DC

Abstract: In this work, we first characterize the hybrid execution patterns of GCNs on Intel Xeon CPU. Guided by the characterization, we design a GCN accelerator, HyGCN, using a hybrid architecture to efficiently perform GCNs. Specifically, first, we build a new programming model to exploit the fine-grained parallelism for our hardware design. Second, we propose a hardware design with two efficient processing engines to alleviate the irregularity of Aggregation phase and leverage the regularity of Combination phase. Besides, these engines can exploit various parallelism and reuse highly reusable data efficiently. Third, we optimize the overall system via inter-engine pipeline for inter-phase fusion and priority-based off-chip memory access coordination to improve off-chip bandwidth utilization. Compared to the state-of-the-art software framework running on Intel Xeon CPU and NVIDIA V100 GPU, our work achieves on average 1509$\times$ speedup with 2500$\times$ energy reduction and average 6.5$\times$ speedup with 10$\times$ energy reduction, respectively.

Citations (263)

View on Semantic Scholar

Summary

The paper introduces a novel edge- and MVM-centric programming model that exploits fine-grained parallelism in graph convolutional networks.
It employs interval-shard partitioning and dynamic sparsity elimination to optimize the irregular aggregation phase and enhance performance.
It integrates multi-granular systolic arrays and an inter-engine pipeline to significantly reduce energy consumption and boost computational throughput.

Overview of "HyGCN: A GCN Accelerator with Hybrid Architecture"

The research paper "HyGCN: A GCN Accelerator with Hybrid Architecture" proposes an architecture aimed specifically at accelerating Graph Convolutional Networks (GCNs), which are increasingly used for processing graph-structured data. GCNs consist of two main computational phases: Aggregation and Combination. The Aggregation phase is characterized by a dynamic and irregular execution pattern, akin to graph processing, while the Combination phase exhibits a more static and regular execution pattern, similar to neural network processing. This bifurcation necessitates a hybrid design capable of optimizing both execution patterns. However, existing architectures inadequately address these diverse demands, leaving substantial room for architectural innovation.

Technical Contributions

Programming Model: HyGCN introduces a novel edge- and matrix-vector multiplication (MVM)-centric programming model. This model effectively exploits fine-grained parallelism and abstracts the GCN operations into edge-centric aggregation and standard MVM operations, enabling a hybrid hardware design.
Aggregation Engine: The Aggregation phase is optimized using interval-shard partitioning and a dynamic sparsity elimination technique that minimizes unnecessary accesses by avoiding sparsity. Additionally, a vertex-disperse processing mode efficiently exploits edge and intra-vertex parallelism by scheduling workload dynamically across all cores.
Combination Engine: The architecture leverages multi-granular systolic arrays, allowing flexible combination operations through parameter reuse. This not only increases the computation throughput but also reduces the overall energy consumption by optimizing data reuse and synchronized computation across vertices.
Inter-engine Pipeline: HyGCN implements an inter-engine pipeline that supports both latency-aware and energy-aware operation modes, offering flexibility depending on workload characteristics. This pipeline efficiently fuses the execution of two phases, reducing the communication overhead and enhancing performance.
Memory Access Coordination: The paper proposes a priority-based off-chip memory access coordination technique that enhances bandwidth utilization by optimizing coordinate accesses from the Aggregation and Combination engines, thereby addressing the memory back-and-forth during data access.

Experimental Results

The proposed system, HyGCN, demonstrates substantial improvements. Evaluations using popular GCN models such as GCN, GraphSage, and GINConv on standard datasets reveal that HyGCN achieves on average a 1509× speedup with a 2500× reduction in energy consumption when compared to conventional software frameworks on CPUs and a 6.5× speedup with a 10× energy reduction compared to state-of-the-art GPU frameworks. The architecture also achieves higher DRAM bandwidth utilization and reduced total off-chip data accesses by effectively coalescing loads and reducing redundancy through its specialized approach.

Implications and Future Directions

HyGCN's architecture highlights the potential for achieving significant performance and energy efficiency through targeted hardware specialization for GCNs. Its hybrid design is well-suited to cope with the diverse execution patterns inherent to GCN workloads. As GCNs continue to gain traction across various domains—ranging from social network analysis and recommendation systems to biological network predictions—the need for such specialized hardware will likely grow.

Future research could extend HyGCN's design concepts to dynamic and real-time GCN applications, potentially exploring adaptive designs that extend flexibility to varying graph topologies and scales. Moreover, integrating support for additional GCN models and incorporating training phases could broaden the scope and utility of GCN accelerators, aligning them closer to the full lifecycle of machine learning workflows.

Overall, HyGCN stands as a testament to the importance of domain-specific acceleration in deep learning, demonstrating substantial strides towards specialized computing solutions for graph-based models.

PDF Markdown