Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ButterFly BFS -- An Efficient Communication Pattern for Multi Node Traversals (2103.13577v1)

Published 25 Mar 2021 in cs.DC and cs.DS

Abstract: Breadth-First Search (BFS) is a building block used in a wide array of graph analytics and is used in various network analysis domains: social, road, transportation, communication, and much more. Over the last two decades, network sizes have continued to grow. The popularity of BFS has brought with it a need for significantly faster traversals. Thus, BFS algorithms have been designed to exploit shared-memory and shared-nothing systems -- this includes algorithms for accelerators such as the GPU. GPUs offer extremely fast traversals at the cost of processing smaller graphs due to their limited memory size. In contrast, CPU shared-memory systems can scale to graphs with several billion edges but do not have enough compute resources needed for fast traversals. This paper introduces ButterFly BFS, a multi-GPU traversal algorithm that allows analyzing significantly larger networks at high rates. ButterFly BFS scales to the similar-sized graphs processed by shared-memory systems while improving performance by more than 10X compared to CPUs. We evaluate our new algorithm on an NVIDIA DGX-2 server with 16 V100 GPUS and show that our algorithm scales with an increase in the number of GPUS. We show that we can achieve a roughly $70\%$ performance linear speedup, which is non-trivial for BFS. For a scale 29 Kronecker graph and edge factor of 8, our new algorithm traverses the graph at a rate of over 300 GTEP/s. That is a high traversal rate for a single server.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Oded Green (10 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.