2000 character limit reached
A Performance Comparison of Sort and Scan Libraries for GPUs (1601.03144v1)
Published 13 Jan 2016 in cs.DC
Abstract: Sorting and scanning are two fundamental primitives for constructing highly parallel algorithms. A number of libraries now provide implementations of these primitives for GPUs, but there is relatively little information about the performance of these implementations. We benchmark seven libraries for 32-bit integer scan and sort, and sorting 32-bit values by 32-bit integer keys. We show that there is a large variation in performance between the libraries, and that no one library has both optimal performance and portability.