Papers
Topics
Authors
Recent
Search
2000 character limit reached

Self-Stabilizing Snapshot Objects for Asynchronous Fail-Prone Network Systems

Published 14 Jun 2019 in cs.DC | (1906.06420v2)

Abstract: A snapshot object simulates the behavior of an array of single-writer/multi-reader shared registers that can be read atomically. Delporte-Gallet et al. proposed two fault-tolerant algorithms for snapshot objects in asynchronous crash-prone message-passing systems. Their first algorithm is \emph{non-blocking}; it allows snapshot operations to terminate once all write operations have ceased. It uses $O(n)$ messages of $O(n \nu)$ bits, where $n$ is the number of nodes and $\nu$ is the number of bits it takes to represent the object. Their second algorithm allows snapshot operations to always terminate independently of write operations. It incurs $O(n2)$ messages. The fault model of Delporte-Gallet et al. considers node crashes. We aim at the design of even more robust snapshot objects via the lenses of self-stabilization---a very strong notion of fault-tolerance. In addition to Delporte-Gallet et al.'s fault model, our self-stabilizing algorithm can recover after the occurrence of transient faults; these faults represent arbitrary violations of the assumptions according to which the system was designed to operate. We propose self-stabilizing variations of Delporte-Gallet et al.'s non-blocking algorithm and always-terminating algorithm. Our algorithms have similar communication costs to the ones by Delporte-Gallet et al. and $O(1)$ recovery time from transient faults. The main differences are that our proposal considers repeated gossiping of $O(\nu)$ bit messages and deals with bounded space. We also consider an input parameter, $\delta$, for which we claim an ability to balance the costs of snapshot operations. We validate our correctness proof, evaluate the performance of Delporte-Gallet et al.'s algorithms and our proposed variations and investigate the properties of $\delta$ via PlanetLab experiments, where significant latency and communication costs reduction are observed.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.