Differentiated latency in data center networks with erasure coded files through traffic engineering (1602.05551v1)
Abstract: This paper proposes an algorithm to minimize weighted service latency for different classes of tenants (or service classes) in a data center network where erasure-coded files are stored on distributed disks/racks and access requests are scattered across the network. Due to limited bandwidth available at both top-of-the-rack and aggregation switches and tenants in different service classes need differentiated services, network bandwidth must be apportioned among different intra- and inter-rack data flows for different service classes in line with their traffic statistics. We formulate this problem as weighted queuing and employ a class of probabilistic request scheduling policies to derive a closed-form upper-bound of service latency for erasure-coded storage with arbitrary file access patterns and service time distributions. The result enables us to propose a joint weighted latency (over different service classes) optimization over three entangled "control knobs": the bandwidth allocation at top-of-the-rack and aggregation switches for different service classes, dynamic scheduling of file requests, and the placement of encoded file chunks (i.e., data locality). The joint optimization is shown to be a mixed-integer problem. We develop an iterative algorithm which decouples and solves the joint optimization as 3 sub-problems, which are either convex or solvable via bipartite matching in polynomial time. The proposed algorithm is prototyped in an open-source, distributed file system, {\em Tahoe}, and evaluated on a cloud testbed with 16 separate physical hosts in an Openstack cluster using 48-port Cisco Catalyst switches. Experiments validate our theoretical latency analysis and show significant latency reduction for diverse file access patterns. The results provide valuable insights on designing low-latency data center networks with erasure coded storage.