Low latency via redundancy (1306.3707v1)

Published 16 Jun 2013 in cs.NI

Abstract: Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks.

Citations (226)

View on Semantic Scholar

Summary

The paper proposes and analyzes using redundancy to reduce latency in distributed systems, employing a queuing model and empirical studies on DNS, databases, and TCP handshakes to show its effectiveness, particularly for tail latency.
Theoretical analysis using a queuing model shows redundancy is beneficial up to 25-50% system load and more effective with higher service time variability, while client-side overhead can reduce gains.
Empirical evaluations across DNS queries, database systems, and TCP handshakes demonstrate that redundancy significantly reduces both mean and tail latency, mitigating high-latency outliers and packet loss impact.

Low Latency via Redundancy: A Summary

The paper entitled "Low Latency via Redundancy" presents a comprehensive exploration of employing redundancy as a technique to mitigate latency in distributed systems. Addressing the challenges inherent in achieving consistently low latency, particularly focusing on the tail of latency distribution, the authors systematically investigate scenarios where redundancy is beneficial.

The core hypothesis of the paper is that redundancy can be leveraged to transform excess system capacity into reduced latency by initiating redundant operations across multiple resources and using the earliest completed result. This approach has demonstrated potential to improve latency despite variability or uncertainty in system conditions.

Theoretical Framework

The authors embark on their inquiry by establishing a queuing model to analyze redundancy. This model evaluates the interaction between the benefits of latency reduction obtained by taking the minimum response time from redundant requests and the costs associated with increased system utilization. From this perspective, several key findings emerge:

The threshold load, which describes the maximum system utilization where redundancy remains beneficial, spans between 25% and 50%.
The effectiveness of redundancy is heightened with increased variability in service time distributions.
Client-side overhead relative to server-side service time can diminish the performance gains of redundancy.

Theoretical conjecture grounded in the model's analysis suggests the threshold load is minimized at around 25.82% when service time is deterministic.

Empirical Insights

Beyond theoretical analysis, the authors conduct empirical evaluations across several applications:

DNS Queries: Replicating DNS queries over multiple servers yields significant reductions in response time, with the fraction of responses exceeding 500 ms reduced by 6.5 times. This improvement underscores the practical utility of redundancy particularly in minimizing high-latency outliers.
Database and Network Systems: The paper also examines database queries and in-network packet operations, illustrating scenarios where redundancy effectively reduces both mean and tail latency, especially when variance in server response exceeds this additional load.
TCP Handshake Replication: By replicating the initial packets in TCP connections, clients can mitigate the impact of packet loss on handshake completion time, with improvements estimated to save upwards of 170 ms per KB of additional traffic.

Systems Perspective

The implications of redundancy are profound for system designers tasked with optimizing both fixed and variable resources. In a fixed resource environment, such as a data center, replication can offer noticeable latency benefits up to a certain system utilization threshold.

In variable resource environments, characterized by elastic resource availability and cost considerations (e.g., cloud services), the decision to employ redundancy will hinge on a careful balance between latency reduction and the economic cost of increased resource utilization.

Conclusion and Future Directions

The authors effectively articulate the conditions under which redundancy can be reliably employed to reduce latency in networked systems, proving its worth in numerous scenarios where it historically remains underutilized. While their research offers a substantial basis for further exploration, future inquiries may refine the understanding of redundancy's impact by considering adaptive redundancy strategies and their compatibility with dynamically managed environments. Additionally, while theoretical conjectures offer direction, empirical analysis across broader system architectures and scale could further illuminate the nuanced applications of redundancy in minimizing latency under variable network conditions.

PDF Markdown