Papers
Topics
Authors
Recent
2000 character limit reached

RegDem: Increasing GPU Performance via Shared Memory Register Spilling (1907.02894v1)

Published 5 Jul 2019 in cs.PF and cs.DC

Abstract: GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage of on-chip resources, such as registers and the programmer-managed shared memory. Higher resource demand means lower effective parallel thread count, and therefore lower program performance. Our investigation found that registers are often the occupancy limiters. The de-facto nvcc compiler-based approach spills excessive registers to the off-chip memory, ignoring the shared memory and leaving the on-chip resources underutilized. To mitigate the register demand, this paper presents a binary translation technique, called RegDem, that spills excessive registers to the underutilized shared memory by transforming the GPU assembly code (SASS). Most GPU programs do not fully use shared memory, thus allowing RegDem to use it for register spilling. The higher occupancy achieved by RegDem outweighs the slightly higher cost of accessing shared memory instead of placing data in registers. The paper also presents a compile-time performance predictor that models instructions stalls to choose the best version from a set of program variants. Cumulatively, these techniques outperform the nvcc compiler with a 9% geometric mean, the highest observed being 18%.

Citations (5)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.