Patterns in the Chaos - a Study of Performance Variation and Predictability in Public IaaS Clouds (1411.2429v2)

Published 10 Nov 2014 in cs.DC

Abstract: Benchmarking the performance of public cloud providers is a common research topic. Previous research has already extensively evaluated the performance of different cloud platforms for different use cases, and under different constraints and experiment setups. In this paper, we present a principled, large-scale literature review to collect and codify existing research regarding the predictability of performance in public Infrastructure-as-a-Service (IaaS) clouds. We formulate 15 hypotheses relating to the nature of performance variations in IaaS systems, to the factors of influence of performance variations, and how to compare different instance types. In a second step, we conduct extensive real-life experimentation on Amazon EC2 and Google Compute Engine to empirically validate those hypotheses. At the time of our research, performance in EC2 was substantially less predictable than in GCE. Further, we show that hardware heterogeneity is in practice less prevalent than anticipated by earlier research, while multi-tenancy has a dramatic impact on performance and predictability.

Citations (215)

View on Semantic Scholar

Summary

The paper reveals that empirical benchmarking challenges conventional beliefs about hardware heterogeneity in public IaaS clouds.
The paper employs extensive experiments with over 53,000 measurements to validate 15 hypotheses on multi-tenancy and regional performance differences.
The paper emphasizes the need for targeted benchmarking in capacity planning as provider-specific performance variation impacts cost-efficiency in cloud deployments.

Patterns in the Chaos: A Study of Performance Variation and Predictability in Public IaaS Clouds

The paper by Leitner and Cito embarks on an empirical investigation into the performance predictability of public Infrastructure-as-a-Service (IaaS) clouds. The research centers around systematically evaluating whether established cloud benchmarking assumptions hold for a range of current cloud providers. Drawing on a structured literature review, the authors propose 15 hypotheses on performance variability and predictability within public IaaS cloud infrastructures, followed by empirical validation via extensive real-life experimentation across four prominent cloud service providers.

The paper is grounded in the realities of public IaaS platforms such as Amazon EC2, Google Compute Engine (GCE), Microsoft Azure, and IBM Softlayer. By utilizing 5 benchmark tests on varying configurations and collecting a robust dataset of 53,918 measurements, the researchers empirically test the formulated hypotheses. The authors focus on performance predictability, investigating influencing factors such as hardware heterogeneity, multi-tenancy, and temporal variations, comprising time of day and regional differences.

One of the core assertions examined concerns hardware heterogeneity, previously identified as a key driver of cloud performance variability. Leitner and Cito deduce that hardware heterogeneity is less prevalent than prior studies indicated, prominently manifesting only in limited scenarios across large-scale providers like Azure and certain EC2 instance types. Instead, their findings suggest a reduction in hardware variety might be indicative of cloud vendors’ efforts to deliver more consistent and predictable services.

Significantly, the authors highlight how multi-tenancy affects performance predictability variably across different cloud providers, emphasizing that certain providers manage resource contention, often called the "noisy neighbor" problem, more effectively. For instance, they observe substantial unpredictability in I/O-specific benchmarks in clouds like EC2 and Azure, whereas GCE and Softlayer demonstrate more predictable performances. This divergence underlines the necessity for cloud users to benchmark resources within their operational environment rigorously as part of any capacity planning.

Counter to speculations about temporal influence, the paper finds no statistically significant impact of time of day or day of the week on performance variability. However, the region remains a substantial determinant, with notable differences in performance characteristics between geographic locations for nearly all providers.

A key practical implication of these findings is the complex landscape of selecting appropriate instance types. Leitner and Cito advocate for precise and targeted benchmarking grounded in the application use case because cost-performance efficiency varies markedly between providers and specific configurations. Notably, they articulate reservations about specialized instance types, like I/O optimized VMs, which may not universally offer superior cost-performance ratios.

The paper advances the discourse on cloud performance, suggesting that while cloud environments evolve, underlying assumptions regarding predictability remain contextually bound to provider strategies and market evolutions. For the scientific community, it calls for renewed empirical studies to continually adapt to ongoing changes in cloud technology landscapes. Moreover, the authors recommend longitudinal studies to capture temporal dynamics in cloud performance, thereby equipping practitioners with deeper insights into long-term planning and optimization of cloud-based services.

In conclusion, by underscoring variances across providers and the current state of hardware heterogeneity and multi-tenancy, Leitner and Cito encourage an evidence-based approach to cloud deployment decisions, urging stakeholders to look beyond prevailing industry narratives to base their strategies on robust, empirical insights. Their work stands as a testimony to the nuanced understanding necessary to harness the full potential of IaaS clouds reliably amidst the prevailing "chaos" of cloud computing environments.

PDF Markdown

Related Papers

YouTube

Show All Videos