DeathStarBench: Benchmarking Microservices

Updated 16 November 2025

DeathStarBench is an open-source benchmark suite that simulates real-world microservices architectures to evaluate cloud and IoT applications.
It models diverse applications—including social networking, media streaming, e-commerce, banking, and IoT control—using containerized microservices and realistic service graphs.
The suite enables detailed analysis of performance metrics, resource utilization, and scalability, while offering modular extensibility for evolving workloads.

DeathStarBench is an open-source benchmark suite designed to enable rigorous, end-to-end evaluation of microservices-based cloud and IoT applications. Departing from prior monolithic benchmarks, DeathStarBench captures the architectural and performance complexities intrinsic to applications composed of tens to hundreds of loosely coupled microservices. Leveraging realistic service graphs that mirror patterns found in large-scale cloud deployments, the suite supports comprehensive system studies—including on-chip, OS/network, and cluster-level effects—and provides modular extensibility for integration of new workloads, protocols, and hardware acceleration schemes (Gan et al., 2019).

1. Design Goals and Benchmark Scope

DeathStarBench addresses multiple system research challenges arising from the microservices paradigm:

End-to-End Behavior: Benchmarks the full-stack behavior (application logic, RPC networks, databases, and orchestration) of representative large-scale cloud and IoT applications.
Real-World Inspired Services: Provides modular, containerized implementations of canonical workloads:
- Social networking
- Media streaming
- E-commerce
- Banking
- IoT swarm control for UAVs
Cross-Stack Analysis: Enables experimentation spanning microarchitectural performance, OS/networking overhead, cluster management policies, and tail-at-scale Quality of Service (QoS).
Extensibility: New microservices, languages, and protocols can be added with minimal engineering overhead.

DeathStarBench thus establishes itself as a de facto reference suite for systems research on cloud microservices, facilitating controlled analysis under realistic, workload-driven conditions.

2. Service Architectures and Workload Structure

Each included benchmark models a real application as a directed acyclic graph (DAG) of microservices, typically involving 10–30 nodes. Communication occurs over lightweight RPC layers (gRPC/HTTP), with data stores and caches backing hot paths.

Application Domain	Microservice Count / Structure	Key Workload Features
Social Network	20-node DAG; read-fanout/write-fanin	75% read, 25% write; small JSON payloads
Media Streaming	Upload/transcode/cdn-service pipeline	Heavy I/O, bursty transcoding, cacheable streaming
E-Commerce	15-node, hub-and-spoke	60% browse, 20% cart, 20% checkout; sync/async mix
Banking	Dual paths (inquiry, transfer)	Distributed txn, consistency/latency sensitive
IoT Swarm	Hierarchical fan-out (central-planner)	<10 ms control hops, periodic telemetry, broadcast

Each benchmark exposes end-to-end application paths and stress points, such as the “social network: get timeline” workload (read-fanout), or e-commerce checkout (synchronous distributed locking and async events).

3. Implementation Infrastructure

DeathStarBench employs a polyglot, container-oriented implementation model for analytical fidelity and extensibility:

Programming Languages: Node.js (Express) for quick APIs, Go (gin/gRPC) for high-throughput sections, Java (Spring Boot) for state-heavy/legacy services, and Python (Flask/FastAPI) for ML integration (e.g., fraud detection).
Containerization and Orchestration: All services are Dockerized, orchestrated by Kubernetes (Deployments, Services, Horizontal Pod Autoscalers) with configurable resource limits via cgroups. Service discovery and load balancing utilize the Kubernetes service mesh, optionally using Envoy.
Data Models: MongoDB is used for document-centric workloads; Redis/Memcached for hot caches; Cassandra/HBase for time-series; MySQL/Postgres for relational data; MQTT/InfluxDB for IoT telemetry.
Orchestration Tools: Provided Helm charts, YAML-based configuration for scaling and resource limits, and batch orchestration via Kubernetes Jobs.

Instrumentation for load generation, metric collection, and profiling includes Locust/Tsung (load), Prometheus/Grafana/InfluxDB (metrics), Linux perf/eBPF (profiling), and Kubernetes metrics API.

4. Measurement Methodology and Performance Metrics

Evaluation centers on metrics relevant to microservices at scale:

Throughput (T): Requests per second (RPS), jobs per minute for batch.
Latency: Reported as P50, P90, P99, P99.9. End-to-end latency ( $L_e$ ) is computed as the sum of per-hop latencies plus network overhead:

$L_e = \sum_{i} L_i + \text{network_overhead}$

Tail-at-Scale Modeling: For $n$ parallel components and latency quantile $p$ ,

$P_p = \min \{ t : \Pr(L \leq t) \geq p \}$

Tail amplification follows $1 - (1-p)^n$ .

Resource Utilization: CPU%, memory, network I/O, context switches, syscalls per second.
Measurement Infrastructure: Locust/Tsung (synthetic user traffic), Prometheus/InfluxDB (metrics collection), Linux perf/eBPF (compute/network), and Kubernetes API (cluster metrics).

This regime allows both system and architectural bottlenecks—including context-switch and syscall overhead, networking stack limitations, and scheduler variance—to be systematically characterized.

5. Experimental Observations and Performance Phenomena

Scalability experiments under hundreds to thousands of users reveal characteristic microservice system behaviors:

Median Latency: P50 typically 10–30 ms for light RPCs across benchmarks.
Tail Latency: Significant tail amplification:
- Social network “get timeline” P99 ≈ 80 ms, P99.9 ≈ 120 ms.
- E-commerce checkout P99 ≈ 150 ms, P99.9 ≈ 300 ms.
Scaling Bottlenecks:
- Throughput increases sublinearly with replicas due to inter-service network contention and head-of-line blocking.
- Increased OS overhead: Microservices incur ~3× more context switches and syscalls/sec than comparable monoliths.
- Kubernetes scheduling delay adds 20–50 ms to cold RPC invocations.
- Packet interleaving and network buffer bloat elevate tail latency by 10–20%.
Cascading QoS Effects: “Power-of-two-choices” load balancing amplifies latency spikes from upstream services across the microservice DAG.
Tail at Scale: Application-level SLOs are strongly constrained by P99/P99.9 path lengths via parallel microservice fans.

6. Engineering Insights: Asynchronous RPC Handling and System Trade-offs

Efficient asynchronous RPC management is pivotal for microservices performance. DeathStarBench’s original implementation of asynchronous RPC in the social network benchmark used a thread-per-RPC model (C++ std::async/std::future), incurring significant OS overhead:

Per RPC call: Each std::async spawns a new kernel thread (clone(2)), leading to high syscall and scheduling contention.
Empirically, ComposePost handlers spend ~23% of CPU cycles in clone/exit under high load (Eyerman et al., 2022).

Replacing threads with user-level fibers (Boost.Fiber ≥1.62) alters the scheduling paradigm:

Fibers: User-space “microthreads” multiplexed on a core; context switches and scheduling occur without kernel involvement.
Performance Results:
- Peak throughput for ComposePost increases from 15,000 (threaded) to 90,000 (fiber), a 6× gain.
- Mixed workload sees a 3.6× jump; tail latencies (P99) remain flat at loads where thread-based designs experience sharp increases.
- Root cause: Elimination of kernel scheduler bottlenecks and minimizing per-RPC overhead.

Implementation trade-offs include reduced core-level parallelism with fibers, portability requirements (Boost.Fiber availability), and more complex debugging/tracing due to user-space schedulers.

Guidelines that emerge are: Migrate to fiber-based schedulers for workloads with high degrees of parallel, I/O-bound RPCs; maintain familiar future::get() APIs; and monitor own clone/exit overhead to evaluate benefit scaling.

7. Modularity and Extensibility

Modularity pervades DeathStarBench at all layers:

Code Layout: Each microservice includes a Dockerfile, workload generator, and health-checks.
Extension Workflow:

1. Add new service code as a subdirectory. 2. Define Kubernetes Deployment/Service YAML, using existing logging/metrics ConfigMaps. 3. Integrate into client harness (Locust/Tsung) by declaring new user scenarios. 4. Optionally update service-graph metadata for automated SLO/tail evaluations.

Deployment Aids: Distributed with templated Helm charts, JSON graph descriptions, and scripts for consistency checking and automated profile generation.

This architecture fosters reproducibility, composability, and rapid integration of new research workloads or microarchitectural techniques.

DeathStarBench thus constitutes a principled, empirically validated framework for microservice system-level analysis. By offering realistic, modular service graphs, comprehensive instrumentation, and extensible infrastructure, it has become a cornerstone in the paper of microservice architecture performance, predictability, and resource utilization across the cloud and IoT application spectrum (Gan et al., 2019, Eyerman et al., 2022).

PDF Markdown Chat (Pro)

References (2)

An Open-Source Benchmark Suite for Cloud and IoT Microservices (2019)

Efficient Asynchronous RPC Calls for Microservices: DeathStarBench Study (2022)

Follow Topic

Get notified by email when new papers are published related to DeathStarBench Benchmark Suite.