Papers
Topics
Authors
Recent
Search
2000 character limit reached

Accelerating Fresh Data Exploration with Fluid ETL Pipelines

Published 23 Mar 2026 in cs.DB | (2603.22220v1)

Abstract: Recently, we have seen an increasing need for fresh data exploration, where data analysts seek to explore the main characteristics or detect anomalies of data being actively collected. In addition to the common challenges in classic data exploration, such as a lack of prior knowledge about the data or the analysis goal, fresh data exploration also demands an ingestion system with sufficient throughput to keep up with rapid data accumulation. However, leveraging traditional Extract-Transform-Load (ETL) pipelines to achieve low query latency can still be extremely resource-intensive as they must conduct an excessive amount of data preprocessing routines (DPRs) (e.g., parsing and indexing) to cover unpredictable data characteristics and analysis goals. To overcome this challenge, we seek to approach it from a different angle: leveraging occasional idle system capacity or cheap preemptive resources (e.g., Amazon Spot Instance) during ingestion. In particular, we introduce a new type of data ingestion system called fluid ETL pipelines, which allow users to start/stop arbitrary DPRs on demand without blocking data ingestion. With fluid ETL pipelines, users can start potentially useful DPRs to accelerate future exploration queries whenever idle/cheap resources are available. Moreover, users can dynamically change which DPRs to run with limited resources to adapt to users' evolving interests. We conducted experiments on a real-world dataset and verified that our vision is viable. The introduction of fluid ETL pipelines also raises new challenges in handling essential tasks, such as ad-hoc query processing, DPR generation, and DPR management. In this paper, we discuss open research challenges in detail and outline potential directions for addressing them.

Authors (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.