Hydra: Resilient and Highly Available Remote Memory (1910.09727v3)
Abstract: We present Hydra, a low-latency, low-overhead, and highly available resilience mechanism for remote memory. Hydra can access erasure-coded remote memory within a single-digit microsecond read/write latency, significantly improving the performance-efficiency trade-off over the state-of-the-art -- it performs similar to in-memory replication with 1.6X lower memory overhead. We also propose CodingSets, a novel coding group placement algorithm for erasure-coded data, that provides load balancing while reducing the probability of data loss under correlated failures by an order of magnitude. With Hydra, even when only 50% of memory is local, unmodified memory-intensive applications achieve performance close to that of the fully in-memory case in the presence of remote failures and outperform the state-of-the-art solutions by up to 4.35X.