Papers
Topics
Authors
Recent
2000 character limit reached

Online Query Scheduling on Source Permutation for Big Data Integration

Published 29 Mar 2015 in cs.DB | (1503.08400v1)

Abstract: Big data integration could involve a large number of sources with unpredictable redundancy information between them. The approach of building a central warehousing to integrate big data from all sources then becomes infeasible because of so large number of sources and continuous updates happening. A practical approach is to apply online query scheduling that inquires data from sources at runtime upon receiving a query. In this paper, we address the Time-Cost Minimization Problem for online query scheduling, and tackle the challenges of source permutation and statistics estimation to minimize the time cost of retrieving answers for the real-time receiving query. We propose the online scheduling strategy that enables the improvement of statistics, the construction of source permutation and the execution of query working in parallel. Experimental results show high efficiency and scalability of our scheduling strategy.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.