How Does It Function? Characterizing Long-term Trends in Production Serverless Workloads (2312.10127v1)

Published 15 Dec 2023 in cs.PF, cs.DC, and cs.LG

Abstract: This paper releases and analyzes two new Huawei cloud serverless traces. The traces span a period of over 7 months with over 1.4 trillion function invocations combined. The first trace is derived from Huawei's internal workloads and contains detailed per-second statistics for 200 functions running across multiple Huawei cloud data centers. The second trace is a representative workload from Huawei's public FaaS platform. This trace contains per-minute arrival rates for over 5000 functions running in a single Huawei data center. We present the internals of a production FaaS platform by characterizing resource consumption, cold-start times, programming languages used, periodicity, per-second versus per-minute burstiness, correlations, and popularity. Our findings show that there is considerable diversity in how serverless functions behave: requests vary by up to 9 orders of magnitude across functions, with some functions executed over 1 billion times per day; scheduling time, execution time and cold-start distributions vary across 2 to 4 orders of magnitude and have very long tails; and function invocation counts demonstrate strong periodicity for many individual functions and on an aggregate level. Our analysis also highlights the need for further research in estimating resource reservations and time-series prediction to account for the huge diversity in how serverless functions behave. Datasets and code available at https://github.com/sir-lab/data-release

Citations (25)

View on Semantic Scholar

Summary

The paper reveals that production serverless workloads exhibit dramatic invocation variations across two distinct Huawei Cloud traces.
It employs granular per-second and per-minute traces to analyze diurnal patterns, resource utilization, and cold-start latencies.
Findings suggest opportunities for optimizing resource scheduling and improving predictive workload forecasting models.

Analysis of Long-term Trends in Serverless Workloads on Huawei Cloud

The paper presented in "How Does It Function? Characterizing Long-term Trends in Production Serverless Workloads" offers a detailed examination of serverless workloads within Huawei Cloud's environments, leveraging data from both internal and public-facing platforms. Spanning over a period exceeding seven months, the data consists of over 1.4 trillion function invocations—providing significant insights into production Function-as-a-Service (FaaS) systems.

Overview and Dataset Description

The paper introduces two distinct traces from Huawei's infrastructures:

Huawei Private Trace: Includes per-second statistics across multiple data centers for Huawei’s internal workloads. This trace details over 200 functions over 234 days, offering granular data on invocations, execution times, and system resource usage.
Huawei Public Trace: Reflects data from a public FaaS platform with over 5000 functions operating within a single data center. This dataset is less granular but provides per-minute invocation counts over 26 days.

This analysis involves calculating statistical features such as request arrival rates, execution delays, cold-start influences, resource utilization, and periodic trends potentially useful for optimizing resource scheduling and autoscaling strategies in serverless environments.

Key Findings

Function Invocation Behaviors

The disparity in function invocations is dramatic, with requests varying by up to nine orders of magnitude. Notably, some functions are invoked over a billion times per day.
The public traces resemble similar datasets from Azure, pointing towards intrinsic trends across cloud platforms.

Periodicity and Ranking

Strong daily periodicity pervades most functions, reflecting distinct diurnal patterns whereby functions typically correlate with human activites or nightly batch processing.
Functions demonstrate variable invocation patterns, with the popularity rankings of functions showing relatively minor oscillations over extended periods.

Resource Consumption Patterns

CPU and memory utilization figures frequently fall below the user-defined limits, indicating a significant opportunity for resource optimization through overcommitted resource scheduling.

Cold-start Latencies

Cold-start latencies, heavily influenced by package sizes and runtime environments, showcase long tails in their distribution. This highlights the necessity to reduce or mask cold-start durations in highly dynamic and demand-variable environments.

Implications and Future Prospects

The paper not only underlines the heterogeneity and dynamism in serverless workloads but also accentuates the need for advanced forecasting models capable of managing fine-grained, long-term workload predictions. The current forecasting methods explored, including Time Series models like TimesNet and N-HiTS, showcase limitations in capturing precise short-term fluctuations while sufficiently accommodating long-range trends.

Areas of Research and Development

Resource Scheduling: The identification of correlated function bursts and periodic trends suggest potential for more efficient resource utilization through predictive scheduling.
Cold-start Optimization: Addressing cold-start latencies remains paramount, particularly through innovations in pre-warming strategies based on predictive demand analytics.
Enhanced Predictive Modeling: The shortcomings in current forecasting models indicate fertile grounds for the development of tailored prediction algorithms that can sustain the dual scale of granularity and span.
Global Univariate Time Series Models: Given the findings, these models hold promise in optimizing forecasting efficacy while reducing computational overheads typical with multivariate approaches.

Conclusion

The paper successfully highlights the considerable variability in serverless function behavior across Huawei Cloud's infrastructures and provides meaningful guidance on managing serverless workloads more effectively. The findings advocate for the evolution of cloud resource management practices, aligning them closely with the observed operational nuances of serverless environments. While incremental advancements in machine learning techniques for forecasting fine-grained data are warranted, broader reconsideration of serverless architecture and cloud scheduling paradigms could expedite improvements in resource efficiency and application performance.

Related Papers

GitHub

GitHub - sir-lab/data-release: This is the repository of the Huawei Public Cloud and Huawei Private Cloud datasets. (44 stars)

YouTube

Show All Videos