Machine learning for Internet of Things data analysis: A survey (1802.06305v1)

Published 17 Feb 2018 in cs.LG, cs.CY, and cs.DC

Abstract: Rapid developments in hardware, software, and communication technologies have allowed the emergence of Internet-connected sensory devices that provide observation and data measurement from the physical world. By 2020, it is estimated that the total number of Internet-connected devices being used will be between 25 and 50 billion. As the numbers grow and technologies become more mature, the volume of data published will increase. Internet-connected devices technology, referred to as Internet of Things (IoT), continues to extend the current Internet by providing connectivity and interaction between the physical and cyber worlds. In addition to increased volume, the IoT generates Big Data characterized by velocity in terms of time and location dependency, with a variety of multiple modalities and varying data quality. Intelligent processing and analysis of this Big Data is the key to developing smart IoT applications. This article assesses the different machine learning methods that deal with the challenges in IoT data by considering smart cities as the main use case. The key contribution of this study is presentation of a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information. The potential and challenges of machine learning for IoT data analytics will also be discussed. A use case of applying Support Vector Machine (SVM) on Aarhus Smart City traffic data is presented for a more detailed exploration.

Authors (6)

Mohammad Saeid Mahdavinejad (6 papers)
Mohammadreza Rezvan (3 papers)
Mohammadamin Barekatain (8 papers)
Peyman Adibi (7 papers)
Payam Barnaghi (31 papers)
Amit P. Sheth (14 papers)

Citations (864)

View on Semantic Scholar

Summary

Machine Learning for Internet of Things Data Analysis: A Survey

The paper, "Machine Learning for Internet of Things Data Analysis: A Survey," provides a comprehensive evaluation of the intersection between rapidly expanding IoT technologies and machine learning methodologies. The discussion focuses on the critical role that machine learning algorithms play in addressing the challenges posed by the diverse and voluminous data generated by IoT devices, with a specific emphasis on smart cities as a primary use case.

Overview

The document begins by acknowledging the exponential growth of Internet-connected devices and the consequent increase in data generation—highlighting estimates that predict between 25-50 billion devices by 2020. This burgeoning landscape of interconnected devices, termed the Internet of Things (IoT), produces "Big Data" characterized by its volume, velocity, variety, and varying levels of veracity. The effective analysis of this data is deemed essential for advancing smart IoT applications.

Key Contributions

The paper's key contributions are outlined as follows:

Taxonomy of Machine Learning Algorithms: The authors present a detailed taxonomy explaining how different machine learning techniques are applied to IoT data to derive actionable insights.
Assessment of IoT Data Characteristics: They examine real-world IoT data characteristics and the associated challenges.
Smart City Use Case Analysis: The paper explores smart cities as an exemplary use case for IoT applications and discusses how machine learning can handle the data-driven demands of such applications.
Practical Implementations: A practical use case involving the application of Support Vector Machine (SVM) on Aarhus Smart City traffic data demonstrates the real-world implications and utility of these techniques.

Detailed Analysis

IoT Applications and Data Characteristics

The authors categorize IoT applications into several components such as Smart Energy, Smart Mobility, Smart Citizen, and Urban Planning, detailing the types of data each application generates and processes. The inherent characteristics of IoT data—such as high-volume, real-time generation rates, heterogeneity, and dynamic nature—are discussed in deep context, indicating significant challenges for data processing and analysis methodologies.

Computing Frameworks

The paper discusses various computing frameworks pivotal for IoT data processing, including Fog Computing, Edge Computing, and Cloud Computing, noting their respective advantages and limitations. For example, fog and edge computing are favored for their lower latency and reduced network load, whereas cloud computing is advantageous for its high computational power but suffers from latency issues.

Machine Learning Algorithms

A methodical categorization of machine learning algorithms is provided, including:

Classification Algorithms: K-Nearest Neighbors (KNN), Naive Bayes, and Support Vector Machine (SVM)
Regression Algorithms: Linear Regression, and Support Vector Regression (SVR)
Combining Models: Random Forests, Bagging
Clustering: K-Means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
Feature Extraction: Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA)
Specific Neural Networks: Feed Forward Neural Network (FFNN)

Each algorithm is analyzed for its suitability in different IoT data contexts, with specific emphasis on their theoretical foundations, practical applications, and algorithmic complexities.

Practical Implementation and Use Case

The use case provided—predicting traffic patterns using SVM on Aarhus Smart City data—demonstrates the applicability of machine learning algorithms to real-world scenarios. This implementation underlines the practical challenges, such as the need for scalable algorithms capable of real-time processing and handling diverse data modalities.

Discussion

The discussion section elucidates how the application specifics and data characteristics guide the choice of appropriate machine learning algorithms. For instance, algorithms like K-Means and DBSCAN are recommended for structure discovery, while One-class SVM is suggested for anomaly detection. The paper underscores the importance of understanding both the data characteristics and algorithmic intricacies to make informed decisions in analyzing IoT data.

Research Trends and Open Issues

The paper concludes with a discussion of future research directions and unresolved issues in the field. Critical areas needing attention include improving the quality of IoT data (addressing issues like noise and data integration), ensuring privacy and security within IoT frameworks, and developing algorithms that can handle the scale and heterogeneity of IoT data efficiently.

Conclusion

This survey presents a nuanced exploration of machine learning's role in IoT data analysis, balancing theoretical insights with practical implementations. The insights provided will be invaluable for researchers and practitioners aiming to harness the full potential of IoT through advanced data analytics.

References are available within the paper for further detailed paper on each discussed algorithm and concept.

PDF Markdown

Related Papers

Find Related Papers