Warp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU

Published 7 Jan 2015 in cs.DC | (1501.01405v1)

Abstract: Stochastic simulations need multiple replications in order to build confidence intervals for their results. Even if we do not need a large amount of replications, it is a good practice to speed-up the whole simulation time using the Multiple Replications In Parallel (MRIP) approach. This approach usually supposes to have access to a parallel computer such as a symmetric mul-tiprocessing machine (with many cores), a computing cluster or a computing grid. In this paper, we propose Warp-Level Parallelism (WLP), a GP-GPU-enabled solution to compute MRIP on GP-GPUs (General-Purpose Graphics Processing Units). These devices display a great amount of parallel computational power at low cost, but are tuned to process efficiently the same operation on several data, through different threads. Indeed, this paradigm is called Single Instruction, Multiple Threads (SIMT). Our approach proposes to rely on small threads groups, called warps, to perform independent computations such as replications. We have benchmarked WLP with three different models: it allows MRIP to be computed up to six times faster than with the SIMT computing paradigm.