Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generating High Quality Random Numbers: A High Throughput Parallel Bitsliced Approach (1909.04750v3)

Published 10 Sep 2019 in cs.CR

Abstract: In this work, by employing a bitsliced data representation as building blocks of algorithms, we showcase the capability and scalability of our proposed method in a variety of PRNG methods in the category of block and stream ciphers. While demonstrating the suitability of stream-ciphers for high throughput PRNG, as an example, we implement and investigate a bitsliced MICKEY 2.0 PRNG by altering the paradigm of internal functions and data structure. The LFSR-based (Linear Feedback Shift Register) nature of the PRNG in our implementation perfectly suits the GPU's many-core structure due to its register oriented architecture and allows the usage of bit slicing technique to further improve the performance. In our SIMD vectorized fully parallel GPU implementation, each GPU thread is capable of generating a remarkable number of 32 pseudo-random bits in each LFSR clock cycle. We then compare our implementation with some of the most significant PRNGs that display a satisfactory performance in both throughput and randomness criteria. The proposed implementation successfully passes the NIST test for statistical randomness and bit-wise correlation criteria. To the best of authors' best knowledge, our method outperforms the current best implementations in the literature for computer-based PRNG and the optical solutions in terms of performance and performance per cost, while maintaining an acceptable measure of randomness. Our highest performance among all of the implemented CPRNGs with the proposed method is achieved by the MICKEY 2.0 algorithm which shows 1.9x improvement over the state of the art NVIDIA's proprietary high-performance PRNG, cuRAND library, achieving 1.6 Tb/s of throughput on the affordable NVIDIA GTX 980 Ti.

Citations (4)

Summary

We haven't generated a summary for this paper yet.