The Night Shift: Understanding Performance Variability of Cloud Serverless Platforms (2304.07177v1)

Published 14 Apr 2023 in cs.DC

Abstract: Function-as-a-Service is a popular cloud programming model that supports developers by abstracting away most operational concerns with automatic deployment and scaling of applications. Due to the high level of abstraction, developers rely on the cloud platform to offer a consistent service level, as decreased performance leads to higher latency and higher cost given the pay-per-use model. In this paper, we measure performance variability of Google Cloud Functions over multiple months. Our results show that diurnal patterns can lead to performance differences of up to 15%, and that the frequency of unexpected cold starts increases threefold during the start of the week. This behavior can negatively impact researchers that conduct performance studies on cloud platforms and practitioners that run cloud applications.

Citations (11)

View on Semantic Scholar

Summary

The paper empirically reveals significant diurnal and weekly variability in Google Cloud Functions, with warm call latencies varying up to 15% during peak hours.
The paper employs high-frequency testing and STL analysis to distinguish between cold and warm starts, uncovering unexpected cold start rates peaking at 12.3% on Mondays.
The paper highlights that temporal variability can skew FaaS benchmarks and recommends extended, granular measurements to accurately evaluate serverless performance.

This paper, "The Night Shift: Understanding Performance Variability of Cloud Serverless Platforms" (2304.07177), investigates the temporal performance fluctuations in Function-as-a-Service (FaaS) platforms, specifically focusing on Google Cloud Functions (GCF). Given that developers rely heavily on cloud providers for consistent performance in serverless computing (as operational concerns are abstracted away and billing is pay-per-use), understanding and mitigating performance variability is crucial.

The paper employs a methodology involving the repeated execution of three different FaaS functions (float, matrix, ml) with varying memory configurations (128MB, 256MB, 512MB, 1024MB depending on the function's needs) on GCF in the europe-west3 region over two months (December 2022 to February 2023). Function invocations were performed every 40 seconds to capture detailed short-term variations. To distinguish between cold and warm start performance, functions were invoked sequentially (first call cold, second call warm) with a 20-minute delay before the next pair of invocations to ensure a cold start. Parallel function copies were used to increase the frequency of measurements.

The key metrics analyzed are:

Request-response latency: Measured by the billed duration of function executions.
Unexpected cold starts: Cold starts occurring immediately after a warm invocation, indicating potential instance recycling.
Long-term trends: Analyzed using Seasonal Trend Decomposition using LOESS (STL) to identify daily/weekly seasonality, overall trends, and outliers.

The findings reveal significant performance variability in GCF:

Performance Variability (Warm Calls): Warm call billed duration exhibits strong diurnal patterns. The float function with 128MB memory, for example, showed a 15% increase in average billed duration during working hours (07:00-16:00) compared to nighttime (23:00-06:00). Weekly trends were also observed, with slightly lower billed duration on weekends compared to the start of the week, though the magnitude was smaller (~4%). Higher memory configurations generally resulted in less relative daily performance change, although larger memory sizes still showed variability (e.g., ml functions with 512MB/1024MB changed ~4%). The analysis of latency distribution for the 256MB float function suggested that GCF might be fulfilling these requests using either 128MB or 512MB containers, leading to high variability at this specific memory size. Cold start duration, while significantly longer than warm starts (9-10x on average), did not seem to be affected by the configured memory size.
Unexpected Cold Starts: The frequency of unexpected cold starts displayed a clear weekly seasonality superimposed on a daily pattern. During working hours (09:00-17:00, Mon-Fri), the average frequency was 9.8%, peaking at 12.3% on Mondays. This is considerably higher than the 3.7% frequency observed during the night (20:00-08:00) or the 3.6% frequency during weekends. This increase in unexpected cold starts during peak times suggests increased resource contention and platform-driven instance recycling or eviction.
Long-Term Trend and Outliers: The STL analysis showed fluctuations in the overall performance trend over the two-month period, with changes up to 21%, but no clear continuous improvement or degradation. Change points in the trend often occurred during the night, potentially coinciding with platform updates. Outliers (hours with exceptionally high average duration) were observed around the turn of the month, possibly linked to increased load from monthly batch processes.

These findings have significant implications for both researchers and practitioners:

Validity of Benchmarks: The observed temporal variability can confound comparative performance studies of FaaS platforms. Benchmark results can be skewed if experiments are conducted at different times of the day or week, or across regions in different time zones. A survey of existing FaaS benchmarking papers found that nearly two-thirds lacked sufficient detail regarding execution timing or region to rule out such effects. The authors demonstrate this by replicating a SEBS benchmark, showing significant variations in the performance difference between AWS Lambda and GCF depending on the time of day. To address this, researchers should repeat experiments over a full day or longer and explicitly report execution times or use techniques like parallel "duet" benchmarking.
Application Performance: Serverless applications deployed on GCF are directly impacted. Functions may experience increased latency and higher execution costs (due to pay-per-second billing) during daytime/peak hours. While postponing time-sensitive executions isn't feasible, practitioners might consider shifting workloads to different cloud regions if performance benefits outweigh increased network latency and data transfer costs. This requires continuous cross-region performance monitoring. Adaptive systems designed to optimize FaaS applications based on performance data also need to account for this inherent platform variability and distinguish it from changes caused by application-level factors or deployment updates. Initial measurements might need to be longer or multiple configurations tested concurrently.

The paper acknowledges limitations, primarily its focus on GCF and CPU-bound workloads. Future work includes extending experiments to other FaaS platforms, regions, memory sizes, programming languages, and exploring variability in other resource dimensions like memory access, disk I/O, and network performance. Continuous measurement is also highlighted as important, as cloud platforms are constantly evolving.

In conclusion, the paper provides empirical evidence of significant daily and weekly performance variability, including increased latency and cold start frequency, on Google Cloud Functions. This variability is attributed to factors like resource contention during peak hours and platform updates. The findings underscore the importance of accounting for temporal variations in FaaS benchmarking and highlight challenges and potential strategies for building performant and cost-effective serverless applications.

PDF Markdown

The Night Shift: Understanding Performance Variability of Cloud Serverless Platforms (2304.07177v1)

Summary

Related Papers