RIoTBench: A Real-time IoT Benchmark for Distributed Stream Processing Platforms (1701.08530v1)

Published 30 Jan 2017 in cs.DC

Abstract: The Internet of Things (IoT) is an emerging technology paradigm where millions of sensors and actuators help monitor and manage, physical, environmental and human systems in real-time. The inherent closedloop responsiveness and decision making of IoT applications make them ideal candidates for using low latency and scalable stream processing platforms. Distributed Stream Processing Systems (DSPS) hosted on Cloud data-centers are becoming the vital engine for real-time data processing and analytics in any IoT software architecture. But the efficacy and performance of contemporary DSPS have not been rigorously studied for IoT applications and data streams. Here, we develop RIoTBench, a Realtime IoT Benchmark suite, along with performance metrics, to evaluate DSPS for streaming IoT applications. The benchmark includes 27 common IoT tasks classified across various functional categories and implemented as reusable micro-benchmarks. Further, we propose four IoT application benchmarks composed from these tasks, and that leverage various dataflow semantics of DSPS. The applications are based on common IoT patterns for data pre-processing, statistical summarization and predictive analytics. These are coupled with four stream workloads sourced from real IoT observations on smart cities and fitness, with peak streams rates that range from 500 to 10000 messages/sec and diverse frequency distributions. We validate the RIoTBench suite for the popular Apache Storm DSPS on the Microsoft Azure public Cloud, and present empirical observations. This suite can be used by DSPS researchers for performance analysis and resource scheduling, and by IoT practitioners to evaluate DSPS platforms.

Citations (145)

View on Semantic Scholar

Summary

The paper introduces RIoTBench, a benchmark suite with 27 micro and 4 application benchmarks designed to evaluate DSPS performance for IoT workloads.
It rigorously simulates IoT scenarios by testing tasks like data parsing, statistical analytics, and storage operations across diverse sensor datasets and scaling factors.
Empirical validation on Apache Storm provides detailed performance metrics, offering actionable insights to optimize resource management in IoT environments.

Evaluating RIoTBench: A Benchmark Suite for Streaming IoT Applications

The paper "RIoTBench: A Real-time IoT Benchmark for Distributed Stream Processing Platforms" introduces a benchmark suite specifically designed for evaluating Distributed Stream Processing Systems (DSPS) in the context of Internet of Things (IoT) applications. The authors recognize that IoT's requirement for low-latency processing and scalable data analysis makes DSPS an essential part of IoT software architectures deployed in cloud environments. Existing DSPS platforms, such as Apache Storm, Flink, and Spark Streaming, while popular for various real-time data applications, have not been rigorously assessed for their performance in IoT-specific scenarios. RIoTBench aims to fill this gap by providing a structured set of benchmarks that simulate realistic IoT workloads and data characteristics.

RIoTBench comprises two primary components: a collection of 27 micro-benchmarks and 4 larger IoT application benchmarks. These are designed to evaluate common tasks such as data parsing, statistical and predictive analytics, as well as data storage operations, which are integral to IoT applications. For instance, it tests the performance of a DSPS in parsing SenML and XML, applying Kalman filters, executing multi-variate linear regression, and storing data in cloud-based NoSQL systems. These tasks are meticulously selected and classified based on their functionality and relevance to IoT scenarios, which often involve processing high-velocity data streams generated by a multitude of sensors.

The application benchmarks, on the other hand, simulate real-world IoT pipelines that are constructed using combinations of the micro-benchmark tasks. The four application benchmarks—ETL (Extract-Transform-Load), Statistical Summarization (STATS), Model Training (TRAIN), and Predictive Analytics (PRED)—cover a comprehensive spectrum of operations from data ingestion to action recommendation systems. They employ various dataflow semantics and patterns supported by DSPS, such as transform, filter, and aggregate, which reflect actual IoT application requirements.

The benchmark suite makes use of four distinct IoT datasets representative of different smart domains: environmental monitoring (CITY), personal fitness sensing (FIT), smart grid energy usage (GRID), and smart transportation (TAXI). A notable methodological contribution is the introduction of temporal and spatial scaling factors to extend the real-world applicability of the dataset to simulate contemporary IoT environments where sensors are far more densely deployed.

Implementation and validation of RIoTBench are performed using the Apache Storm platform, demonstrating its utility across a range of IoT workloads with different input rates and data distributions. The empirical results are carefully presented, showcasing detailed metrics such as throughput, latency, CPU/memory utilization, and jitter, which are crucial for understanding the performance limitations and strengths of DSPS under IoT-specific conditions. For example, tasks like SenML parsing exhibit high throughput with low CPU usage, while XML parsing reveals its CPU-intensive nature, thereby impacting the decision to prefer JSON-based serialization in IoT systems.

RIoTBench's importance extends beyond its immediate utility for evaluating specific DSPS platforms; it provides a standard, consistent framework for assessing different systems' capabilities to handle the unique demands of IoT applications. By supplying a common foundation, RIoTBench aids researchers and developers in identifying and resolving bottlenecks in their systems, optimizing resource management, and ultimately enhancing the efficiency and reliability of IoT applications.

The future potential for RIoTBench lies in its adaptability to include expanding IoT application domains and advancements in DSPS technologies. Future work could integrate pattern detection tasks and notifications systems into the benchmark suite, enhancing its scope. Additionally, evaluating other DSPS such as Apache Flink and comparing their performance across similar benchmarks will further refine our understanding of fast data processing platforms' utilities in IoT.

RIoTBench represents a substantial contribution to the field of IoT and distributed data processing, providing a rigorous and practical tool for advancing the efficacy of DSPS platforms in IoT settings. As IoT continues to expand, benchmark suites like RIoTBench will be instrumental in driving innovations that meet the sector's evolving demands.

PDF Markdown

Related Papers

GitHub

GitHub - dream-lab/riot-bench: Real-time IoT Benchmark Suite (49 stars)