Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Recursive Query Optimization (2312.04282v2)

Published 7 Dec 2023 in cs.DB and cs.PL

Abstract: Performance-critical industrial applications, including large-scale program, network, and distributed system analyses, are increasingly reliant on recursive queries for data analysis. Yet traditional relational algebra-based query optimization techniques do not scale well to recursive query processing due to the iterative nature of query evaluation, where relation cardinalities can change unpredictably during the course of a single query execution. To avoid error-prone cardinality estimation, adaptive query processing techniques use runtime information to inform query optimization, but these systems are not optimized for the specific needs of recursive query processing. In this paper, we introduce Adaptive Metaprogramming, an innovative technique that shifts recursive query optimization and code generation from compile-time to runtime using principled metaprogramming, enabling dynamic optimization and re-optimization before and after query execution has begun. We present a custom join-ordering optimization applicable at multiple stages during query compilation and execution. Through Carac, a custom Datalog engine, we evaluate the optimization potential of Adaptive Metaprogramming and show unoptimized recursive query execution time can be improved by three orders of magnitude and hand-optimized queries by 6x.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Y. Smaragdakis and M. Bravenboer, “Using datalog for fast and easy program analysis,” in Proceedings of the First International Conference on Datalog Reloaded, ser. Datalog’10.   Berlin, Heidelberg: Springer-Verlag, 2010, p. 245–251. [Online]. Available: https://doi.org/10.1007/978-3-642-24206-9_14
  2. S. Lagouvardos, J. T. Dolby, N. Grech, A. Antoniadis, and Y. Smaragdakis, “Static analysis of shape in tensorflow programs,” in ECOOP, 2020.
  3. N. Grech, L. Brent, B. Scholz, and Y. Smaragdakis, “Gigahorse: Thorough, declarative decompilation of smart contracts,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019, pp. 1176–1186.
  4. J. D. Ullman, “Bottom-up beats top-down for datalog,” in Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART PODS, ser. PODS ’89.   New York, NY, USA: ACM, 1989, p. 140–149. [Online]. Available: https://doi.org/10.1145/73721.73736
  5. V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann, “How good are query optimizers, really?” Proc. VLDB Endow., vol. 9, no. 3, p. 204–215, nov 2015. [Online]. Available: https://doi.org/10.14778/2850583.2850594
  6. A. Deshpande, J. M. Hellerstein, and V. Raman, “Adaptive query processing: why, how, when, what next,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27-29, 2006, 2006, pp. 806–807. [Online]. Available: https://doi.org/10.1145/1142473.1142603
  7. T. Kersten, V. Leis, A. Kemper, T. Neumann, A. Pavlo, and P. Boncz, “Everything you always wanted to know about compiled and vectorized queries but were afraid to ask,” Proc. VLDB Endow., vol. 11, no. 13, p. 2209–2222, sep 2018. [Online]. Available: https://doi.org/10.14778/3275366.3284966
  8. A. Kohn, V. Leis, and T. Neumann, “Adaptive execution of compiled queries,” in Adaptive Execution of Compiled Queries, 04 2018, pp. 197–208.
  9. M. Karpathiotakis, I. Alagiannis, T. Heinis, M. Branco, and A. Ailamaki, “Just-in-time data virtualization: Lightweight data management with vida,” in Just-In-Time Data Virtualization: Lightweight Data Management with ViDa, 01 2015.
  10. P. Menon, A. Ngom, L. Ma, T. C. Mowry, and A. Pavlo, “Permutable compiled queries: Dynamically adapting compiled queries without recompiling,” Proc. VLDB Endow., vol. 14, no. 2, p. 101–113, nov 2020. [Online]. Available: https://doi.org/10.14778/3425879.3425882
  11. T. Schmidt, P. Fent, and T. Neumann, “Efficiently compiling dynamic code for adaptive query processing,” 13th Workshop on Accelerating Analytics and Data Management, Sep 2022.
  12. G. Gottlob, S. Ceri, and L. Tanca, “What you always wanted to know about datalog (and never dared to ask),” IEEE Transactions on Knowledge and Data Engineering, vol. 1, no. 01, pp. 146–166, jan 1989.
  13. J. D. Ullman, “Bottom-up beats top-down for datalog,” in ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1989.
  14. B. Scholz, H. Jordan, P. Subotić, and T. Westmann, “On fast large-scale program analysis in datalog,” in Proceedings of the 25th International Conference on Compiler Construction, ser. CC 2016.   New York, NY, USA: ACM, 2016, p. 196–206. [Online]. Available: https://doi.org/10.1145/2892208.2892226
  15. M. Madsen, M.-H. Yee, and O. Lhoták, “From datalog to flix: A declarative language for fixed points on lattices,” SIGPLAN Not., vol. 51, no. 6, p. 194–208, jun 2016. [Online]. Available: https://doi.org/10.1145/2980983.2908096
  16. J. Whaley, D. Avots, M. Carbin, and M. S. Lam, “Using datalog with binary decision diagrams for program analysis,” in Proceedings of the Third Asian Conference on Programming Languages and Systems, ser. APLAS’05.   Berlin, Heidelberg: Springer-Verlag, 2005, p. 97–118. [Online]. Available: https://doi.org/10.1007/11575467_8
  17. L. D. K Hoder, N Bjørner, “Uz- an efficient engine for fixed points with constraints,” in Lecture Notes in Computer Science, vol. 6806, 2011, pp. 457–462, 23rd International Conference on Computer Aided Verification, CAV 2011 ; Conference date: 01-07-2011.
  18. A. Herlihy, P. Chrysogelos, and A. Ailamaki, “Boosting efficiency of external pipelines by blurring application boundaries,” in 12th Conference on Innovative Data Systems Research, CIDR 2022, Chaminade, January 11-15, 2022, Online Proceedings.   Chaminade, CA: www.cidrdb.org, 2022.
  19. V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann, “Query optimization through the looking glass, and what we found running the join order benchmark,” The VLDB Journal, vol. 27, no. 5, p. 643–668, oct 2018. [Online]. Available: https://doi.org/10.1007/s00778-017-0480-7
  20. P. McIlroy, “Optimistic sorting and information theoretic complexity,” in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA ’93.   USA: Society for Industrial and Applied Mathematics, 1993, p. 467–474.
  21. Stucki, Nicolas Alexander, “Scalable metaprogramming in scala 3,” InfoScience EPFL, 2023. [Online]. Available: http://infoscience.epfl.ch/record/299370
  22. Y. Futamura, “Partial evaluation of computation process–an approach to a compiler-compiler,” Higher-Order and Symbolic Computation, vol. 12, pp. 381–391, 1999.
  23. A. Kennedy and C. V. Russo, “Generalized algebraic data types and object-oriented programming,” SIGPLAN Not., vol. 40, no. 10, p. 21–40, oct 2005. [Online]. Available: https://doi.org/10.1145/1103845.1094814
  24. OpenJDK, “Jep draft: Class-file api (preview).” [Online]. Available: https://openjdk.org/jeps/8280389
  25. A. Shipilev, S. Kuksenko, A. Astrand, S. Friberg, and H. Loef, “Openjdk code tools: Jmh,” 2022. [Online]. Available: http://openjdk.java.net/projects/code-tools/jmh/
  26. S. Center, “Tasty-query.” [Online]. Available: https://github.com/scalacenter/tasty-query
  27. A. Møller and M. I. Schwartzbach, “Static program analysis,” October 2018, department of Computer Science, Aarhus University, https://cs.au.dk/ amoeller/spa/spa.pdf.
  28. Y. Sundblad, “The ackermann function. a theoretical, computational, and formula manipulative study,” BIT, vol. 11, no. 1, p. 107–119, mar 1971. [Online]. Available: https://doi.org/10.1007/BF01935330
  29. S. Arch, X. Hu, D. Zhao, P. Subotić, and B. Scholz, “Building a join optimizer for soufflé,” in Logic-Based Program Synthesis and Transformation, A. Villanueva, Ed.   Cham: Springer International Publishing, 2022, pp. 83–102.
  30. N. Stucki, A. Biboudis, and M. Odersky, “A practical unification of multi-stage programming and macros,” SIGPLAN Not., vol. 53, no. 9, p. 14–27, nov 2018. [Online]. Available: https://doi.org/10.1145/3393934.3278139
  31. A. Sahebolamri, T. Gilray, and K. Micinski, “Seamless deductive inference via macros,” in Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction, ser. CC 2022.   New York, NY, USA: Association for Computing Machinery, 2022, p. 77–88. [Online]. Available: https://doi.org/10.1145/3497776.3517779
  32. P. Alvaro, W. R. Marczak, N. Conway, J. M. Hellerstein, D. Maier, and R. Sears, “Dedalus: Datalog in time and space,” in Datalog Reloaded, O. de Moor, G. Gottlob, T. Furche, and A. Sellers, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 262–281.
  33. A. Cheung, N. Crooks, J. M. Hellerstein, and M. Milano, “New directions in cloud programming,” in 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11-15, 2021, Online Proceedings.   www.cidrdb.org, 2021. [Online]. Available: http://cidrdb.org/cidr2021/papers/cidr2021_paper16.pdf
  34. B. Ketsman and P. Koutris, “Modern datalog engines,” Foundations and Trends® in Databases, vol. 12, no. 1, pp. 1–68, 2022. [Online]. Available: http://dx.doi.org/10.1561/1900000073
  35. Z. Fan, J. Zhu, Z. Zhang, A. Albarghouthi, P. Koutris, and J. M. Patel, “Scaling-up in-memory datalog processing: Observations and techniques,” Proc. VLDB Endow., vol. 12, no. 6, p. 695–708, feb 2019. [Online]. Available: https://doi.org/10.14778/3311880.3311886
  36. R. Sethi, M. Traverso, D. Sundstrom, D. Phillips, W. Xie, Y. Sun, N. Yegitbasi, H. Jin, E. Hwang, N. Shingte, and C. Berner, “Presto: Sql on everything,” in 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019, pp. 1802–1813.
  37. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,” in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’12.   USA: USENIX Association, 2012, p. 2.
  38. C. Lattner, M. Amini, U. Bondhugula, A. Cohen, A. Davis, J. A. Pienaar, R. Riddle, T. Shpeisman, N. Vasilache, and O. Zinenko, “MLIR: scaling compiler infrastructure for domain specific computation,” in IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2021, Seoul, South Korea, February 27 - March 3, 2021, J. W. Lee, M. L. Soffa, and A. Zaks, Eds.   IEEE, 2021, pp. 2–14. [Online]. Available: https://doi.org/10.1109/CGO51591.2021.9370308

Summary

We haven't generated a summary for this paper yet.