Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

libhclooc: Software Library Facilitating Out-of-core Implementations of Accelerator Kernels on Hybrid Computing Platforms (1808.05056v1)

Published 15 Aug 2018 in cs.DC

Abstract: Hardware accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors (PHIs), and Field-Programmable Gate Arrays (FPGAs) are now ubiquitous in extreme-scale high performance computing (HPC), cloud, and Big data platforms to facilitate execution of workloads that demand high energy efficiency. They present unique interfaces and programming models therefore posing several limitations, which must be addressed to facilitate execution of large workloads. There is no library providing a unifying interface that allows programmers to write reusable out-of-core implementations of their data-parallel kernels that can run efficiently on different mainstream accelerators such as GPUs, PHIs, and FPGAs. We address this shortage in this paper. We present a library called libhclooc, which provides a unifying interface facilitating out-of-core implementations for data parallel kernels on the three different mainstream accelerators (GPUs, Intel Xeon Phis, FPGAs). We implement out-of-core matrix-matrix multiplication (MMOOC) using the libhclooc API and demonstrate its superior performance over vendor implementations. We show that it suffers from a maximum overhead of 10%, 4%, and 8% (due to abstraction) compared to the state-of-the-art optimised implementations for Nvidia K40c GPU, Nvidia P100 PCIe GPU, and Intel Xeon Phi 3120P respectively. We also show that using libhclooc API reduces the number of lines of code (LOC) by 75% thereby drastically improving programmer productivity.

Summary

We haven't generated a summary for this paper yet.