Contrasting Effects of Replication in Parallel Systems: From Overload to Underload and Back (1602.07978v1)
Abstract: Task replication has recently been advocated as a practical solution to reduce latencies in parallel systems. In addition to several convincing empirical studies, some others provide analytical results, yet under some strong assumptions such as Poisson arrivals, exponential service times, or independent service times of the replicas themselves, which may lend themselves to some contrasting and perhaps contriving behavior. For instance, under the second assumption, an overloaded system can be stabilized by a replication factor, but can be sent back in overload through further replication. In turn, under the third assumption, strictly larger stability regions of replication systems do not necessarily imply smaller delays. Motivated by the need to dispense with such common and restricting assumptions, which may additionally cause unexpected behavior, we develop a unified and general theoretical framework to compute tight bounds on the distribution of response times in general replication systems. These results immediately lend themselves to the optimal number of replicas minimizing response time quantiles, depending on the parameters of the system (e.g., the degree of correlation amongst replicas). As a concrete application of our framework, we design a novel replication policy which can improve the stability region of classical fork-join queueing systems by $\mathcal{O}(\ln K)$, in the number of servers $K$.