Iso-Quality of Service: Fairly Ranking Servers for Real-Time Data Analytics

Published 14 Jan 2015 in cs.DC | (1501.03481v1)

Abstract: We present a mathematically rigorous Quality-of-Service (QoS) metric which relates the achievable quality of service metric (QoS) for a real-time analytics service to the server energy cost of offering the service. Using a new iso-QoS evaluation methodology, we scale server resources to meet QoS targets and directly rank the servers in terms of their energy-efficiency and by extension cost of ownership. Our metric and method are platform-independent and enable fair comparison of datacenter compute servers with significant architectural diversity, including micro-servers. We deploy our metric and methodology to compare three servers running financial option pricing workloads on real-life market data. We find that server ranking is sensitive to data inputs and desired QoS level and that although scale-out micro-servers can be up to two times more energy-efficient than conventional heavyweight servers for the same target QoS, they are still six times less energy efficient than high-performance computational accelerators.

Abstract PDF Upgrade to Chat

Authors (6)

Citations (4)

View on Semantic Scholar

Summary

The paper introduces the iso-QoS metric for fair, workload-centric ranking of heterogeneous data analytics servers.
The paper demonstrates that Xeon Phi can use 2× to 10× less energy than conventional servers, while scaled-out microservers also show significant energy savings.
The paper highlights that energy and latency trade-offs vary with workload size, vectorization strategy, and kernel type, impacting real-time financial analytics performance.

Iso-Quality of Service: A Fair Ranking Methodology for Real-Time Data Analytics Servers

Motivation and Scope of the Iso-QoS Metric

The challenge of server selection for latency-sensitive real-time analytics workloads in datacenters is complicated by architectural diversity, cost variability, and highly dynamic workloads. Current approaches inadequately compare server platforms due to mismatched performance metrics, heterogeneity, and lack of workload-centric fairness. This paper introduces a mathematically rigorous, platform-agnostic Quality-of-Service (QoS) metric that incorporates performance (seconds per option) and energy (Joules per option) characteristics of real-time financial analytics workloads, independent of hardware parameters. The iso-QoS methodology enables a fair, repeatable ranking of diverse server architectures, capable of capturing the fluctuating demands of low-latency analytics such as option pricing on real market data (1501.03481).

Option Pricing Kernels and Workload Description

The study focuses on canonical financial analytics problems: European option pricing using Monte Carlo (MC) and Binomial Tree (BT) models. MC simulations estimate option value via lognormal asset path sampling, with complexity $O(N)$ and reliance on transcendental operations. BT constructs a $N+1$ -level discrete lattice, dominated by $O(N^2)$ add-multiply operations. Both are event-driven: every stock price update triggers kernel computation for an array of contracts, and contracts not computed before the next event are discarded.

Experimental Platforms and Methodology

Three server platforms encapsulate the architectural spectrum:

Intel Sandy Bridge (dual Xeon E5-2650, 2 $\times$ 8 $\times$ 1, AVX256): High-performance x86-64, 200W+ power, 2.00 GHz, 32 GB DDR3.
Intel Xeon Phi Knights Corner (1 $\times$ 60 cores, KNC512): Manycore, 60 $\times$ 4-way hyperthreaded, 512-bit vector units, 108–140 W, over PCIe.
Calxeda Viridis (ARM Cortex-A9 microserver, 16%%%%7 $\times$ 8%%%%1, NEON128): ARM SoC, scale-out, 16 nodes × 4 cores/node, 1.4 GHz, 25–40 W per node.

(Figure 1)

Figure 1: The current supply path to target CPUs, highlighting PRE-VRM measurement point, ensures platform-agnostic, comparable energy readings.

Live market data (e.g., Facebook and Google stock price updates) are replayed, triggering computation of 617 European option contracts per update. Power readings are logged at comparable CPU supply points across platforms (before VRM), using RAPL, IPMI, or equivalent interfaces. Experiments are performed in “performance” mode at maximal voltage/frequency on all platforms, under compiler and hand-optimized vectorization.

Construction and Role of the QoS Metric

Workload behavior is governed by the stochastic arrival of market events. Arrival times closely follow a Poisson process, validated empirically:

(Figure 2)

Figure 2: Cumulative frequency distributions of stock price updates reveal Poissonian arrival behavior in both Facebook and Google trading sessions.

QoS for a workload is defined as the proportion of option contract evaluations completed before the next price event, i.e., “successes” versus total required computations. For a desired QoS target, the corresponding minimum permissible gap $G$ is computed. A platform meets the SLA if $N+1$ 0; the total energy expended is then a product of the number of events satisfying the gap constraint, number of options priced, and joules per option for the kernel and configuration.

Iso-QoS-Based Comparative Analysis and Key Findings

Through the iso-QoS methodology, the study restricts comparison to the subset of configurations satisfying the $N+1$ 1 constraint for a given $N+1$ 2 target. Several findings are prominent:

Xeon Phi demonstrably dominates energy efficiency, consuming between 2 $N+1$ 3 and one order of magnitude less energy than the other platforms under most configurations and increasing its competitive advantage at higher QoS levels and larger kernel sizes.
Viridis, despite being a microserver-class SoC, achieves up to 2 $N+1$ 4 lower energy consumption compared to the Intel Sandy Bridge server when scaled-out, especially as BT kernel sizes grow.
The ranking among servers is sensitive to problem size, vectorization strategy, and kernel (MC vs BT), with the energy/latency trade-off highly dependent on both hardware and application parameters.

(Figure 3)

Figure 3: BT kernel energy consumption scaling (at 80% QoS) demonstrates that Viridis' scale-out approach eventually surpasses Sandy Bridge in energy efficiency with increasing workload complexity.

Contradictory to prevailing assumptions, compiler auto-vectorization (AUTOVECT), while reducing execution time, often yields non-optimal energy efficiency and frequently fails to meet lower-latency gap constraints imposed by high QoS targets, irrespective of the platform.

Implications for Datacenter Operation and Theoretical Impact

The introduced iso-QoS metric provides datacenter operators with a practical methodology relevant for capacity planning, procurement, and TCO modeling under SLA constraints. By decoupling platform comparison from hardware specifics and focusing on actual service delivered, the methodology enables valid cross-architecture assessments, even as datacenter hardware evolves toward heterogeneity. The clear separation of fixed and variable costs, and dynamic prediction capabilities, allow targeted economic optimization for specific classes of real-time analytics clients.

Theoretically, the iso-QoS model abstracts the event-driven compute-update paradigm, making the approach portable to other “process before next event” analytics workloads. Algorithmic dependencies of platform ranking (e.g., dependence on kernel implementation or vectorization approach) highlight future research avenues in workload-specific HW/SW co-design, especially as microservers incorporate programmable accelerators and datacenters adopt elastic heterogeneous provisioning.

Conclusion

The iso-QoS methodology advances the state-of-the-art in fair, workload-centric server ranking for real-time data analytics by enabling direct, mathematically principled energy and performance comparisons across heterogeneous architectures. Through rigorous modeling and comprehensive empirical analysis on real-world financial workloads, the study demonstrates that microservers, when effectively scaled, can outperform conventional servers in energy efficiency for selected kernels, while specialized manycore accelerators such as Xeon Phi set the bar for minimum energy consumption under high QoS. The iso-QoS approach provides a flexible, extensible foundation for future research into dynamic provisioning, accelerator integration, and broader workload generalization in emerging datacenter architectures.

Markdown Report Issue