What's Live? Understanding Distributed Consensus (2001.04787v2)
Abstract: Distributed consensus algorithms such as Paxos have been studied extensively. They all use the same definition of safety. Liveness is especially important in practice despite well-known theoretical impossibility results. However, many different liveness properties and assumptions have been stated, and there are no systematic comparisons for better understanding of these properties. This paper systematically studies and compares different liveness properties stated for over 30 prominent consensus algorithms and variants. We introduce a precise high-level language and formally specify these properties in the language. We then create a hierarchy of liveness properties combining two hierarchies of the assumptions used and a hierarchy of the assertions made, and compare the strengths and weaknesses of algorithms that ensure these properties. Our formal specifications and systematic comparisons led to the discovery of a range of problems in various stated liveness properties, from too weak assumptions for which no liveness assertions can hold, to too strong assumptions making it trivial to achieve the assertions. We also developed TLA+ specifications of these liveness properties, and we use model checking of execution steps to illustrate liveness patterns for Paxos.