Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Resilience for Multigrid Software at the Extreme Scale (1506.06185v1)

Published 20 Jun 2015 in cs.MS and cs.NA

Abstract: Fault tolerant algorithms for the numerical approximation of elliptic partial differential equations on modern supercomputers play a more and more important role in the future design of exa-scale enabled iterative solvers. Here, we combine domain partitioning with highly scalable geometric multigrid schemes to obtain fast and fault-robust solvers in three dimensions. The recovery strategy is based on a hierarchical hybrid concept where the values on lower dimensional primitives such as faces are stored redundantly and thus can be recovered easily in case of a failure. The lost volume unknowns in the faulty region are re-computed approximately with multigrid cycles by solving a local Dirichlet problem on the faulty subdomain. Different strategies are compared and evaluated with respect to performance, computational cost, and speed up. Especially effective are strategies in which the local recovery in the faulty region is executed in parallel with global solves and when the local recovery is additionally accelerated. This results in an asynchronous multigrid iteration that can fully compensate faults. Excellent parallel performance on a current peta-scale system is demonstrated.

Citations (10)

Summary

We haven't generated a summary for this paper yet.