Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bandwidth-Aware Scheduling with SDN in Hadoop: A New Trend for Big Data

Published 12 Mar 2014 in cs.DC, cs.NI, and cs.PF | (1403.2800v1)

Abstract: Software Defined Networking (SDN) is a revolutionary network architecture that separates out network control functions from the underlying equipment and is an increasingly trend to help enterprises build more manageable data centers where big data processing emerges as an important part of applications. To concurrently process large-scale data, MapReduce with an open source implementation named Hadoop is proposed. In practical Hadoop systems one kind of issue that vitally impacts the overall performance is know as the NP-complete minimum make span problem. One main solution is to assign tasks on data local nodes to avoid link occupation since network bandwidth is a scarce resource. Many methodologies for enhancing data locality are proposed such as the HDS and state-of-the-art scheduler BAR. However, all of them either ignore allocating tasks in a global view or disregard available bandwidth as the basis for scheduling. In this paper we propose a heuristic bandwidth-aware task scheduler BASS to combine Hadoop with SDN. It is not only able to guarantee data locality in a global view but also can efficiently assign tasks in an optimized way. Both examples and experiments demonstrate that BASS has the best performance in terms of job completion time. To our knowledge, BASS is the first to exploit talent of SDN for big data processing and we believe it points out a new trend for large-scale data processing.

Citations (100)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (4)

Collections

Sign up for free to add this paper to one or more collections.