HEROv2: Full-Stack Open-Source Research Platform for Heterogeneous Computing (2201.03861v1)

Published 11 Jan 2022 in cs.DC, cs.AR, and cs.PF

Abstract: Heterogeneous computers integrate general-purpose host processors with domain-specific accelerators to combine versatility with efficiency and high performance. To realize the full potential of heterogeneous computers, however, many hardware and software design challenges have to be overcome. While architectural and system simulators can be used to analyze heterogeneous computers, they are faced with unavoidable compromises between simulation speed and performance modeling accuracy. In this work we present HEROv2, an FPGA-based research platform that enables accurate and fast exploration of heterogeneous computers consisting of accelerators based on clusters of 32-bit RISC-V cores and an application-class 64-bit ARMv8 or RV64 host processor. HEROv2 allows to seamlessly share data between 64-bit hosts and 32-bit accelerators and comes with a fully open-source on-chip network, a unified heterogeneous programming interface, and a mixed-data-model, mixed-ISA heterogeneous compiler based on LLVM. We evaluate HEROv2 in four case studies from the application level over toolchain and system architecture down to accelerator microarchitecture. We demonstrate how HEROv2 enables effective research and development on the full stack of heterogeneous computing. For instance, the compiler can tile loops and infer data transfers to and from the accelerators, which leads to a speedup of up to 4.4x compared to the original program and in most cases is only 15 % slower than a handwritten implementation, which requires 2.6x more code.

Authors (3)

Andreas Kurth (11 papers)
Björn Forsberg (3 papers)
Luca Benini (362 papers)

Citations (15)

View on Semantic Scholar

Summary

The paper introduces HEROv2 as a full-stack open‐source platform that unifies 64-bit hosts with 32-bit accelerators for heterogeneous computing research.
The evaluation reveals up to 4.3x speedup using a unified memory space and LLVM-enabled toolchains for mixed-data-model code generation.
Compiler-driven optimizations, such as the AutoDMA plugin, automate memory management and reduce manual programming effort for significant performance gains.

An Evaluation of HEROv2: A Full-Stack Open-Source Platform for Heterogeneous Computing

The HEROv2 platform represents a significant effort in the domain of heterogeneous computing, addressing the synergistic needs of flexible hardware architectures and comprehensive software support. The authors provide a detailed account of the platform's components, highlighting its applicability for research in heterogeneous computing stacks.

The core of HEROv2 lies in its heterogeneous computing platform composed of a 64-bit ARMv8 or RISC-V host and a cluster of 32-bit RISC-V accelerators. The host and accelerator combination facilitates a comprehensive examination across the computing stack, encompassing applications, toolchains, system architectures, and microarchitectural components. HEROv2 operates on Xilinx FPGAs, allowing for a wide-scale evaluation of heterogeneous systems in a full-stack integrated environment. Notably, all hardware components and the software stack, including a heterogeneous compiler and runtime library, are released under open-source licenses, enabling community-driven research and contributions.

Key Technical Contributions and Results

Seamless Data Sharing Across Heterogeneous Domains: HEROv2 acknowledges the complexity of data sharing between different ISA architectures. To resolve this, the authors introduce a unified memory space facilitated by a hybrid IOMMU that bridges the gap between 64-bit hosts and 32-bit accelerators, ensuring coherence at the system level. The use of LLVM-enabled custom toolchains for mixed-data-model code generation indicates a focus on addressing the needs of real-world applications.
Application-Level Evaluation: The evaluation of several linear algebra kernels, convolution operations, and complete applications like YOLO elucidates HEROv2's capability in practical scenarios. These benchmarks uncover a significant speedup when leveraging software-managed memories and parallel computing through accelerator clusters, achieving up to 4.3x over baseline designs without adding complex code or relying on direct off-chip main memory access.
Compiler-Driven Optimizations: Introducing the AutoDMA plugin, HEROv2 automizes memory management, providing optimizations such as loop tiling and data transfers, which are traditionally manual programming efforts. This automation produced a speedup of up to 4.4x, requiring significantly less code alteration and thereby enhancing productivity.
Insights into Accelerator Architectures and ISA Extensions: HEROv2 also facilitates an extensive evaluation of the impact of ISA extensions typical for domain-specific accelerators, such as custom instructions and hardware loops. These investigations revealed execution speed increases of up to 3.5x in specific scenarios, underscoring the importance of tailored hardware-software interfaces in specialized computation tasks within heterogeneous systems.
Theoretical and Practical Implications: The coexistence of ARM and RISC-V within HEROv2 enables diverse investigations into architecture-level decisions that might otherwise require separate evaluation platforms. The adoption across both academia and industry demonstrates potential paths for future exploration in highly optimized heterogeneous computational frameworks. HEROv2's strategic open-source model can foster innovations and explore new applications for domain-specific accelerators in collaborative environments.

Conclusion

HEROv2 serves as an essential tool for researchers, providing direct insights into heterogeneous computing dynamics through a fully integrated open-source stack. Its modular architecture supports scalability and interdisciplinary studies, democratizing access to state-of-the-art computing structures. As heterogeneous systems evolve, platforms like HEROv2 are vital in adapting and assessing breakthroughs across the computational hierarchy, forming a backbone for future computing infrastructure developments. Researchers engaged in the alignment of hardware-software co-design and exploration of innovative accelerative techniques will find HEROv2 an invaluable asset in extending the boundaries of efficient, heterogeneous computation.

PDF Markdown

Related Papers

GitHub

GitHub - pulp-platform/hero: Heterogeneous Research Platform (HERO) for exploration of heterogeneous computers consisting of programmable many-core accelerators and an application-class host CPU, including full-stack software and hardware. (106 stars)