Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Optimization for Speculative Execution of Multiple Jobs in a MapReduce-like Cluster (1406.0609v3)

Published 3 Jun 2014 in cs.DC

Abstract: Nowadays, a computing cluster in a typical data center can easily consist of hundreds of thousands of commodity servers, making component/ machine failures the norm rather than exception. A parallel processing job can be delayed substantially as long as one of its many tasks is being assigned to a failing machine. To tackle this so-called straggler problem, most parallel processing frameworks such as MapReduce have adopted various strategies under which the system may speculatively launch additional copies of the same task if its progress is abnormally slow or simply because extra idling resource is available. In this paper, we focus on the design of speculative execution schemes for a parallel processing cluster under different loading conditions. For the lightly loaded case, we analyze and propose two optimization-based schemes, namely, the Smart Cloning Algorithm (SCA) which is based on maximizing the job utility and the Straggler Detection Algorithm (SDA) which minimizes the overall resource consumption of a job. We also derive the workload threshold under which SCA or SDA should be used for speculative execution. Our simulation results show both SCA and SDA can reduce the job flowtime by nearly 60% comparing to the speculative execution strategy of Microsoft Mantri. For the heavily loaded case, we propose the Enhanced Speculative Execution (ESE) algorithm which is an extension of the Microsoft Mantri scheme. We show that the ESE algorithm can beat the Mantri baseline scheme by 18% in terms of job flowtime while consuming the same amount of resource.

Citations (8)

Summary

We haven't generated a summary for this paper yet.