2000 character limit reached
A Multi-Stage CUDA Kernel for Floyd-Warshall (1001.4108v2)
Published 23 Jan 2010 in cs.DC and cs.PF
Abstract: We present a new implementation of the Floyd-Warshall All-Pairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction latency.