- The paper introduces the iso-QoS metric for fair, workload-centric ranking of heterogeneous data analytics servers.
- The paper demonstrates that Xeon Phi can use 2× to 10× less energy than conventional servers, while scaled-out microservers also show significant energy savings.
- The paper highlights that energy and latency trade-offs vary with workload size, vectorization strategy, and kernel type, impacting real-time financial analytics performance.
Iso-Quality of Service: A Fair Ranking Methodology for Real-Time Data Analytics Servers
Motivation and Scope of the Iso-QoS Metric
The challenge of server selection for latency-sensitive real-time analytics workloads in datacenters is complicated by architectural diversity, cost variability, and highly dynamic workloads. Current approaches inadequately compare server platforms due to mismatched performance metrics, heterogeneity, and lack of workload-centric fairness. This paper introduces a mathematically rigorous, platform-agnostic Quality-of-Service (QoS) metric that incorporates performance (seconds per option) and energy (Joules per option) characteristics of real-time financial analytics workloads, independent of hardware parameters. The iso-QoS methodology enables a fair, repeatable ranking of diverse server architectures, capable of capturing the fluctuating demands of low-latency analytics such as option pricing on real market data (1501.03481).
Option Pricing Kernels and Workload Description
The study focuses on canonical financial analytics problems: European option pricing using Monte Carlo (MC) and Binomial Tree (BT) models. MC simulations estimate option value via lognormal asset path sampling, with complexity O(N) and reliance on transcendental operations. BT constructs a N+1-level discrete lattice, dominated by O(N2) add-multiply operations. Both are event-driven: every stock price update triggers kernel computation for an array of contracts, and contracts not computed before the next event are discarded.
Three server platforms encapsulate the architectural spectrum:
- Intel Sandy Bridge (dual Xeon E5-2650, 2×8×1, AVX256): High-performance x86-64, 200W+ power, 2.00 GHz, 32 GB DDR3.
- Intel Xeon Phi Knights Corner (1×60 cores, KNC512): Manycore, 60 × 4-way hyperthreaded, 512-bit vector units, 108–140 W, over PCIe.
- Calxeda Viridis (ARM Cortex-A9 microserver, 16%%%%7×8%%%%1, NEON128): ARM SoC, scale-out, 16 nodes × 4 cores/node, 1.4 GHz, 25–40 W per node.
(Figure 1)
Figure 1: The current supply path to target CPUs, highlighting PRE-VRM measurement point, ensures platform-agnostic, comparable energy readings.
Live market data (e.g., Facebook and Google stock price updates) are replayed, triggering computation of 617 European option contracts per update. Power readings are logged at comparable CPU supply points across platforms (before VRM), using RAPL, IPMI, or equivalent interfaces. Experiments are performed in “performance” mode at maximal voltage/frequency on all platforms, under compiler and hand-optimized vectorization.
Construction and Role of the QoS Metric
Workload behavior is governed by the stochastic arrival of market events. Arrival times closely follow a Poisson process, validated empirically:
(Figure 2)
Figure 2: Cumulative frequency distributions of stock price updates reveal Poissonian arrival behavior in both Facebook and Google trading sessions.
QoS for a workload is defined as the proportion of option contract evaluations completed before the next price event, i.e., “successes” versus total required computations. For a desired QoS target, the corresponding minimum permissible gap G is computed. A platform meets the SLA if N+10; the total energy expended is then a product of the number of events satisfying the gap constraint, number of options priced, and joules per option for the kernel and configuration.
Iso-QoS-Based Comparative Analysis and Key Findings
Through the iso-QoS methodology, the study restricts comparison to the subset of configurations satisfying the N+11 constraint for a given N+12 target. Several findings are prominent:
- Xeon Phi demonstrably dominates energy efficiency, consuming between 2N+13 and one order of magnitude less energy than the other platforms under most configurations and increasing its competitive advantage at higher QoS levels and larger kernel sizes.
- Viridis, despite being a microserver-class SoC, achieves up to 2N+14 lower energy consumption compared to the Intel Sandy Bridge server when scaled-out, especially as BT kernel sizes grow.
- The ranking among servers is sensitive to problem size, vectorization strategy, and kernel (MC vs BT), with the energy/latency trade-off highly dependent on both hardware and application parameters.
(Figure 3)
Figure 3: BT kernel energy consumption scaling (at 80% QoS) demonstrates that Viridis' scale-out approach eventually surpasses Sandy Bridge in energy efficiency with increasing workload complexity.
Contradictory to prevailing assumptions, compiler auto-vectorization (AUTOVECT), while reducing execution time, often yields non-optimal energy efficiency and frequently fails to meet lower-latency gap constraints imposed by high QoS targets, irrespective of the platform.
Implications for Datacenter Operation and Theoretical Impact
The introduced iso-QoS metric provides datacenter operators with a practical methodology relevant for capacity planning, procurement, and TCO modeling under SLA constraints. By decoupling platform comparison from hardware specifics and focusing on actual service delivered, the methodology enables valid cross-architecture assessments, even as datacenter hardware evolves toward heterogeneity. The clear separation of fixed and variable costs, and dynamic prediction capabilities, allow targeted economic optimization for specific classes of real-time analytics clients.
Theoretically, the iso-QoS model abstracts the event-driven compute-update paradigm, making the approach portable to other “process before next event” analytics workloads. Algorithmic dependencies of platform ranking (e.g., dependence on kernel implementation or vectorization approach) highlight future research avenues in workload-specific HW/SW co-design, especially as microservers incorporate programmable accelerators and datacenters adopt elastic heterogeneous provisioning.
Conclusion
The iso-QoS methodology advances the state-of-the-art in fair, workload-centric server ranking for real-time data analytics by enabling direct, mathematically principled energy and performance comparisons across heterogeneous architectures. Through rigorous modeling and comprehensive empirical analysis on real-world financial workloads, the study demonstrates that microservers, when effectively scaled, can outperform conventional servers in energy efficiency for selected kernels, while specialized manycore accelerators such as Xeon Phi set the bar for minimum energy consumption under high QoS. The iso-QoS approach provides a flexible, extensible foundation for future research into dynamic provisioning, accelerator integration, and broader workload generalization in emerging datacenter architectures.