A Multi-Stage CUDA Kernel for Floyd-Warshall (1001.4108v2)

Published 23 Jan 2010 in cs.DC and cs.PF

Abstract: We present a new implementation of the Floyd-Warshall All-Pairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction latency.

Citations (26)