A Comparative Study of Delta Parquet, Iceberg, and Hudi for Automotive Data Engineering Use Cases (2508.13396v1)
Abstract: The automotive industry generates vast amounts of data from sensors, telemetry, diagnostics, and real-time operations. Efficient data engineering is critical to handle challenges of latency, scalability, and consistency. Modern data lakehouse formats Delta Parquet, Apache Iceberg, and Apache Hudi offer features such as ACID transactions, schema enforcement, and real-time ingestion, combining the strengths of data lakes and warehouses to support complex use cases. This study presents a comparative analysis of Delta Parquet, Iceberg, and Hudi using real-world time-series automotive telemetry data with fields such as vehicle ID, timestamp, location, and event metrics. The evaluation considers modeling strategies, partitioning, CDC support, query performance, scalability, data consistency, and ecosystem maturity. Key findings show Delta Parquet provides strong ML readiness and governance, Iceberg delivers high performance for batch analytics and cloud-native workloads, while Hudi is optimized for real-time ingestion and incremental processing. Each format exhibits tradeoffs in query efficiency, time-travel, and update semantics. The study offers insights for selecting or combining formats to support fleet management, predictive maintenance, and route optimization. Using structured datasets and realistic queries, the results provide practical guidance for scaling data pipelines and integrating machine learning models in automotive applications.
Collections
Sign up for free to add this paper to one or more collections.