Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Agent Join (2312.14291v1)

Published 21 Dec 2023 in cs.DB and cs.MA

Abstract: It is crucial to provide real-time performance in many applications, such as interactive and exploratory data analysis. In these settings, users often need to view subsets of query results quickly. It is challenging to deliver such results over large datasets for relational operators over multiple relations, such as join. Join algorithms usually spend a long time on scanning and attempting to join parts of relations that may not generate any result. Current solutions usually require lengthy and repeated preprocessing, which is costly and may not be possible to do in many settings. Also, they often support restricted types of joins. In this paper, we outline a novel approach for achieving efficient join processing in which a scan operator of the join learns during query execution, the portions of its relations that might satisfy the join predicate. We further improve this method using an algorithm in which both scan operators collaboratively learn an efficient join execution strategy. We also show that this approach generalizes traditional and non-learning methods for joining. Our extensive empirical studies using standard benchmarks indicate that this approach outperforms similar methods considerably.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. BlinkDB: queries with bounded errors and bounded response times on very large data. In Eighth Eurosys Conference 2013, EuroSys ’13, April 14-17, 2013, Zdenek Hanzálek, Hermann Härtig, Miguel Castro, and M. Frans Kaashoek (Eds.). ACM, Prague, Czech Republic, 29–42. https://doi.org/10.1145/2465351.2465355
  2. An Inquiry into Machine Learning-based Automatic Configuration Tuning Services on Real-World Database Management Systems. Proc. VLDB Endow. 14, 7 (2021), 1241–1253.
  3. Bandit problems with infinitely many arms. The Annals of Statistics 25, 5 (1997), 2103 – 2116. https://doi.org/10.1214/aos/1069362389
  4. Michael J. Carey and Donald Kossmann. 1997. On Saying ”Enough Already!” in SQL. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13-15, 1997, Joan Peckham (Ed.). ACM Press, Tucson, Arizona, USA, 219–230. https://doi.org/10.1145/253260.253302
  5. Michael J. Carey and Donald Kossmann. 1998. Reducing the Braking Distance of an SQL Query Engine. In VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, Ashish Gupta, Oded Shmueli, and Jennifer Widom (Eds.). Morgan Kaufmann, New York City, New York, USA, 158–169.
  6. Approximate Query Processing: No Silver Bullet. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, May 14-19, 2017, Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu (Eds.). ACM, Chicago, IL, USA, 511–519. https://doi.org/10.1145/3035918.3056097
  7. Surajit Chaudhuri and Vivek R. Narasayya. 2007. Self-Tuning Database Systems: A Decade of Progress. In Proceedings of the 33rd International Conference on Very Large Data Bases, September 23-27, 2007, Christoph Koch, Johannes Gehrke, Minos N. Garofalakis, Divesh Srivastava, Karl Aberer, Anand Deshpande, Daniela Florescu, Chee Yong Chan, Venkatesh Ganti, Carl-Christian Kanne, Wolfgang Klas, and Erich J. Neuhold (Eds.). ACM, University of Vienna, Austria, 3–14.
  8. Stavros Christodoulakis. 1983. Estimating record selectivities. Information Systems 8, 2 (1983), 105–115. https://doi.org/10.1016/0306-4379(83)90035-2
  9. Data cleaning: Overview and emerging challenges. In Proceedings of the 2016 international conference on management of data. 2201–2206.
  10. Scalable and Adaptive Online Joins. Proc. VLDB Endow. 7, 6 (2014), 441–452. https://doi.org/10.14778/2732279.2732281
  11. Database systems - the complete book (2. ed.). Pearson Education, New Jersey, USA.
  12. Peter J. Haas and Joseph M. Hellerstein. 1999. Ripple Joins for Online Aggregation. In SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1-3, 1999, Alex Delis, Christos Faloutsos, and Shahram Ghandeharizadeh (Eds.). ACM Press, Philadelphia, Pennsylvania, USA, 287–298. https://doi.org/10.1145/304182.304208
  13. Confidence intervals for policy evaluation in adaptive experiments. Proceedings of the National Academy of Sciences 118, 15 (2021), e2014602118. https://doi.org/10.1073/pnas.2014602118 arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2014602118
  14. Online Aggregation. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13-15, 1997, Joan Peckham (Ed.). ACM Press, Tucson, Arizona, USA, 171–182. https://doi.org/10.1145/253260.253291
  15. Daniel G. Horvitz and D. J. Thompson. 1952. A Generalization of Sampling Without Replacement from a Finite Universe. J. Amer. Statist. Assoc. 47 (1952), 663–685. https://api.semanticscholar.org/CorpusID:120274071
  16. A Disk-Based Join With Probabilistic Guarantees. In Proceedings of the ACM SIGMOD International Conference on Management of Data, June 14-16, 2005, Fatma Özcan (Ed.). ACM, Baltimore, Maryland, USA, 563–574. https://doi.org/10.1145/1066157.1066222
  17. Cuttlefish: A Lightweight Primitive for Adaptive Query Processing. CoRR abs/1802.09180 (2018). arXiv:1802.09180
  18. Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
  19. Wander Join: Online Aggregation via Random Walks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, June 26 - July 01, 2016, Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, San Francisco, CA, USA, 615–629. https://doi.org/10.1145/2882903.2915235
  20. A Scalable Hash Ripple Join Algorithm. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (Madison, Wisconsin) (SIGMOD ’02). Association for Computing Machinery, New York, NY, USA, 252–262. https://doi.org/10.1145/564691.564721
  21. Learning State Representations for Query Optimization with Deep Reinforcement Learning. CoRR abs/1803.08604 (2018). arXiv:1803.08604
  22. Ibrahim Sabek and Tim Kraska. 2023. The Case for Learned In-Memory Joins. Proc. VLDB Endow. 16, 7 (may 2023), 1749–1762. https://doi.org/10.14778/3587136.3587148
  23. Silvio Salza and Mario Terranova. 1989. Evaluating the Size of Queries on Relational Databases with non Uniform Distribution and Stochastic Dependence. In Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, May 31 - June 2, 1989, James Clifford, Bruce G. Lindsay, and David Maier (Eds.). ACM Press, Portland, Oregon, USA, 8–14. https://doi.org/10.1145/67544.66927
  24. PGMJoins: Random Join Sampling with Graphical Models. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD ’21). Association for Computing Machinery, New York, NY, USA, 1610–1622. https://doi.org/10.1145/3448016.3457302
  25. Aleksandrs Slivkins. 2019. Introduction to Multi-Armed Bandits. Found. Trends Mach. Learn. 12, 1-2 (2019), 1–286. https://doi.org/10.1561/2200000068
  26. TPC. . TPC Benchmark. www.tpc.org.
  27. SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, Amsterdam, The Netherlands, 1153–1170. https://doi.org/10.1145/3299869.3300088
  28. Tolga Urhan and Michael J. Franklin. 2000. XJoin: A Reactively-Scheduled Pipelined Join Operator. IEEE Data Eng. Bull. 23, 2 (2000), 27–33. http://dblp.uni-trier.de/db/journals/debu/debu23.html#UrhanF00
  29. Learning to Sample: Counting with Complex Queries. Proc. VLDB Endow. 13, 3 (2019), 390–402. https://doi.org/10.14778/3368289.3368302

Summary

We haven't generated a summary for this paper yet.