Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Achieving High-Performance Fault-Tolerant Routing in HyperX Interconnection Networks (2404.04315v1)

Published 5 Apr 2024 in cs.DC

Abstract: Interconnection networks are key actors that condition the performance of current large datacenter and supercomputer systems. Both topology and routing are critical aspects that must be carefully considered for a competitive system network design. Moreover, when daily failures are expected, this tandem should exhibit resilience and robustness. Low-diameter networks, including HyperX, are cheaper than typical Fat Trees. But, to be really competitive, they have to employ evolved routing algorithms to both balance traffic and tolerate failures. In this paper, SurePath, an efficient fault-tolerant routing mechanism for HyperX topology is introduced and evaluated. SurePath leverages routes provided by standard routing algorithms and a deadlock avoidance mechanism based on an Up/Down escape subnetwork. This mechanism not only prevents deadlock but also allows for a fault-tolerant solution for these networks. SurePath is thoroughly evaluated in the paper under different traffic patterns, showing no performance degradation under extremely faulty scenarios.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets