LFZip: Lossy compression of multivariate floating-point time series data via improved prediction (1911.00208v2)

Published 1 Nov 2019 in eess.SP and cs.LG

Abstract: Time series data compression is emerging as an important problem with the growth in IoT devices and sensors. Due to the presence of noise in these datasets, lossy compression can often provide significant compression gains without impacting the performance of downstream applications. In this work, we propose an error-bounded lossy compressor, LFZip, for multivariate floating-point time series data that provides guaranteed reconstruction up to user-specified maximum absolute error. The compressor is based on the prediction-quantization-entropy coder framework and benefits from improved prediction using linear models and neural networks. We evaluate the compressor on several time series datasets where it outperforms the existing state-of-the-art error-bounded lossy compressors. The code and data are available at https://github.com/shubhamchandak94/LFZip

Citations (29)

View on Semantic Scholar

Summary

The paper introduces LFZip, a novel error-bounded lossy compression tool for multivariate floating-point time series data via improved prediction.
LFZip employs a prediction-quantization-entropy coding framework using advanced linear and neural network predictors to achieve error-bounded compression.
Evaluations show LFZip achieves superior compression ratios compared to state-of-the-art tools, particularly on multivariate datasets, making it efficient for applications like edge computing.

Analysis of LFZip: A Compressor for Multivariate Floating-Point Time Series Data

The paper "LFZip: Lossy compression of multivariate floating-point time series data via improved prediction" introduces LFZip, a novel error-bounded lossy compression tool designed to address the challenges of compressing multivariate floating-point time series data, a need that is becoming increasingly critical with the proliferation of IoT devices and data-driven applications.

Key Contributions and Methodology

LFZip employs a prediction-quantization-entropy coder framework, enhanced through the use of advanced prediction mechanisms such as linear models and neural networks (NNs). The algorithm is crafted to allow users to specify a maximum absolute error constraint, ensuring that data reconstruction remains within an acceptable error boundary, which is crucial for preserving the integrity of data in downstream applications.

Prediction Models: The research employs both the Normalized Least Mean Square (NLMS) predictor and neural network-based predictors. The NLMS predictor adapts to data trends using linear techniques, while the neural network predictor leverages deeper architectures like fully connected layers and biGRU, intended to capture complex patterns in time series data.
Framework Components: Encoding involves predicting the next data point through a causal model, quantizing the error with a step size derived from the maximum allowable error, and applying entropy coding to compress the quantized data. It is worth noting that the use of BSC (Burst Sort Compression) as the entropy coding stage offers robust performance, crucial for maximizing compression ratios.
Multivariate Data Compression: LFZip extends its functionality to compress multivariate datasets by taking advantage of inter-variable correlations. This approach promises significant gains in scenarios where variables within the dataset exhibit co-dependency.
Computational Efficiency: The code is noted for its relatively high throughput on moderate univariate datasets, although neural network methods do incur additional computational overhead.

Empirical Evaluation

The paper's extensive experimental evaluation underscores LFZip's superior performance over existing state-of-the-art lossy compressors such as SZ. On a range of datasets, LFZip achieves marked improvements in compression ratios, particularly where datasets deviate from simple linear or polynomial representations.

Results show that for datasets like "gas" and "sen," LFZip's ability to leverage multivariate correlations results in significant compression gains. In scenarios involving simpler univariate data, LFZip (NLMS) generally outperforms competitors, particularly on complex datasets where causal prediction models are advantageous.

Implications and Future Directions

The practical implications of LFZip's enhanced lossy compression are substantial. As the tool allows efficient handling of large-scale time series data with error-bounded guarantees, it represents a valuable advancement for edge computing devices that are typically limited in processing power and bandwidth.

Theoretically, this research extends the scope of error-bounded compression by showcasing how incremental advances in prediction models can result in meaningful gains in compression efficiency. It also opens pathways for examining other sophisticated prediction frameworks within the compression paradigm.

Future developments could focus on optimizing the neural network implementation for increased efficiency and exploring broader types of datasets such as high-dimensional scientific data. Additionally, integrating adaptive learning techniques to improve the generalizability and adaptability of prediction models in real-time data processing scenarios could further enhance LFZip's utility.

By addressing both methodological rigor and practical relevance, LFZip represents a noteworthy step forward in time series data compression, supporting the continued evolution of data-driven technologies in an era of burgeoning data volumes and diversity.

PDF Markdown

Related Papers

GitHub

GitHub - shubhamchandak94/LFZip: Error bounded lossy compression of time series using prediction-quantization approach (24 stars)

YouTube

Show All Videos