Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD -- Extended Version (2407.15363v1)

Published 22 Jul 2024 in cs.DB

Abstract: Modern organizations manage their data with a wide variety of specialized cloud database engines (e.g., Aurora, BigQuery, etc.). However, designing and managing such infrastructures is hard. Developers must consider many possible designs with non-obvious performance consequences; moreover, current software abstractions tightly couple applications to specific systems (e.g., with engine-specific clients), making it difficult to change after initial deployment. A better solution would virtualize cloud data management, allowing developers to declaratively specify their workload requirements and rely on automated solutions to design and manage the physical realization. In this paper, we present a technique called blueprint planning that achieves this vision. The key idea is to project data infrastructure design decisions into a unified design space (blueprints). We then systematically search over candidate blueprints using cost-based optimization, leveraging learned models to predict the utility of a blueprint on the workload. We use this technique to build BRAD, the first cloud data virtualization system. BRAD users issue queries to a single SQL interface that can be backed by multiple cloud database services. BRAD automatically selects the most suitable engine for each query, provisions and manages resources to minimize costs, and evolves the infrastructure to adapt to workload shifts. Our evaluation shows that BRAD meet user-defined performance targets and improve cost-savings by 1.6-13x compared to serverless auto-scaling or HTAP systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (119)
  1. Proteus: Autonomous Adaptive Storage for Mixed Workloads. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD ’22). 700–714. https://doi.org/10.1145/3514221.3517834
  2. Ivo Adan and Jacques Resing. 2015. Queueing Systems. https://www.win.tue.nl/~iadan/queueing.pdf.
  3. RHEEM: Enabling Cross-Platform Data Processing: May the Big Data Be with You! Proceedings of the VLDB Endowment 11, 11 (2018), 1414–1427. https://doi.org/10.14778/3236187.3236195
  4. Learning-based Query Performance Modeling and Prediction. In Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE ’12). 390–401. https://doi.org/10.1109/ICDE.2012.64
  5. Towards Scalable Hybrid Stores: Constraint-Based Rewriting to the Rescue. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD ’19). 1660–1677.
  6. Amazon Web Services. 2021. Achieve up to 35% better price/performance with Amazon Aurora using new Graviton2 instances. https://aws.amazon.com/about-aws/whats-new/2021/03/achieve-up-to-35-percent-better-price-performance-with-amazon-aurora-using-new-graviton2-instances/.
  7. Amazon Web Services. 2022. AWS announces Amazon Aurora zero-ETL integration with Amazon Redshift . https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-aurora-zero-etl-integration-redshift/. Retrieved July 20, 2024.
  8. Amazon Web Services. 2023a. AWS announces Amazon Aurora I/O-Optimized. https://aws.amazon.com/about-aws/whats-new/2023/05/amazon-aurora-i-o-optimized/. Retrieved July 20, 2024.
  9. Amazon Web Services. 2023b. How do I resize an Amazon Redshift cluster? https://repost.aws/knowledge-center/resize-redshift-cluster. Retrieved July 20, 2024.
  10. Amazon Web Services. 2024a. Amazon Athena. https://aws.amazon.com/athena/. Retrieved July 20, 2024.
  11. Amazon Web Services. 2024b. Amazon Athena Pricing. https://aws.amazon.com/athena/pricing/. Retrieved July 20, 2024.
  12. Amazon Web Services. 2024c. Amazon Aurora. https://aws.amazon.com/rds/aurora/. Retrieved July 20, 2024.
  13. Amazon Web Services. 2024d. Amazon Aurora Pricing. https://aws.amazon.com/rds/aurora/pricing/. Retrieved July 20, 2024.
  14. Amazon Web Services. 2024e. Amazon EC2. https://aws.amazon.com/ec2/. Retrieved July 20, 2024.
  15. Amazon Web Services. 2024f. Amazon Redshift. https://aws.amazon.com/redshift/. Retrieved July 20, 2024.
  16. Amazon Web Services. 2024g. Amazon Redshift Pricing. https://aws.amazon.com/redshift/pricing/. Retrieved July 20, 2024.
  17. Amazon Web Services. 2024h. Amazon S3. https://aws.amazon.com/s3/. Retrieved July 20, 2024.
  18. Amazon Web Services. 2024i. AWS CloudFormation. https://aws.amazon.com/pm/cloudformation/. Retrieved July 20, 2024.
  19. Amazon Web Services. 2024j. AWS RDS Proxy. https://aws.amazon.com/rds/proxy/. Retrieved July 20, 2024.
  20. Amazon Web Services. 2024k. Data Lakes and Analytics on AWS. https://aws.amazon.com/big-data/datalakes-and-analytics/. Retrieved July 20, 2024.
  21. Amazon Web Services. 2024l. Purpose-Built Databases on AWS. https://aws.amazon.com/products/databases/. Retrieved July 20, 2024.
  22. Gene M. Amdahl. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, Spring Joint Computer Conference (AFIPS ’67 (Spring)). 483–485. https://doi.org/10.1145/1465482.1465560
  23. Rapid Adoption of Cloud Data Warehouse Technology Using Datometry Hyper-Q. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD ’18). 825–839. https://doi.org/10.1145/3183713.3190652
  24. Pgbouncer Authors. 2024. Pgbouncer - Lightweight connection pooler for PostgreSQL. https://www.pgbouncer.org/. Retrieved July 20, 2024.
  25. A Dynamic Distributed Federated Database. In Proceedings of the 2nd Annual Conference on International Technology Alliance (ACITA ’08).
  26. Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.
  27. Overview of Multidatabase Transaction Management. VLDB Journal 1 (10 1992), 181–239. https://doi.org/10.1145/1925805.1925811
  28. Yuri Breitbart and Avi Silberschatz. 1988. Multidatabase Update Issues. In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data (SIGMOD ’88). 135–142. https://doi.org/10.1145/50202.50217
  29. Tigger: A Database Proxy That Bounces with User-Bypass. Proceedings of the VLDB Endowment 16, 11 (2023), 3335–3348. https://doi.org/10.14778/3611479.3611530
  30. The Sky Above The Clouds. arXiv:2205.07147 [cs.DC] https://arxiv.org/abs/2205.07147
  31. New Directions in Cloud Programming. arXiv:2101.01159 [cs.DC] https://arxiv.org/abs/2101.01159
  32. SLA-tree: A Framework for Efficiently Supporting SLA-based Decisions in Cloud Computing. In Proceedings of the 14th International Conference on Extending Database Technology (EDBT ’11). 129–140. https://doi.org/10.1145/1951365.1951383
  33. cppreference.com. 2024. C++ named requirements: Compare. https://en.cppreference.com/w/cpp/named_req/Compare. Retrieved July 20, 2024.
  34. The Snowflake Elastic Data Warehouse. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD ’16). 215–226. https://doi.org/10.1145/2882903.2903741
  35. Instance-Optimized Data Layouts for Cloud Analytics Workloads. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD ’21). 418–431. https://doi.org/10.1145/3448016.3457270
  36. Tsunami: A Learned Multi-Dimensional Index for Correlated Data and Skewed Workloads. Proceedings of the VLDB Endowment 14, 2 (2020), 74–86. https://doi.org/10.14778/3425879.3425880
  37. The BigDAWG Polystore System. SIGMOD Rec. 44, 2 (August 2015), 11–16. https://doi.org/10.1145/2814710.2814713
  38. Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction. In Proceedings of the 17th International Conference on Extending Database Technology (EDBT ’14). 109–120. https://doi.org/10.5441/002/EDBT.2014.11
  39. Johannes Frnkranz and Eyke Hllermeier. 2010. Preference Learning. Springer-Verlag, Berlin, Heidelberg.
  40. The SAP HANA Database – An Architecture Overview. IEEE Data Engineering Bulletin 35 (03 2012), 28–33.
  41. On Serializability of Multidatabase Transactions Through Forced Local Conflicts. In Proceedings of the Seventh International Conference on Data Engineering (ICDE ’91). 314–323.
  42. Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 70. PMLR, 1263–1272. https://proceedings.mlr.press/v70/gilmer17a.html
  43. Google, Inc. 2024a. BigQuery Omni. https://cloud.google.com/bigquery/docs/omni-introduction. Retrieved July 20, 2024.
  44. Google, Inc. 2024b. Google Cloud Databases. https://cloud.google.com/products/databases. Retrieved July 20, 2024.
  45. Google, Inc. 2024c. Google Compute Engine. https://cloud.google.com/compute. Retrieved July 20, 2024.
  46. Data and Analytics Cloud Adoption Survey Reveals Data Governance and Cost Challenges. Gartner Report. https://www.gartner.com/document/5106731.
  47. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. Proceedings of the VLDB Endowment 15, 4 (2021), 752–765. https://doi.org/10.14778/3503585.3503586
  48. Mor Harchol-Balter. 2013. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press.
  49. HashiCorp. 2024. Terraform. https://www.terraform.io. Retrieved July 20, 2024.
  50. Benjamin Hilprecht and Carsten Binnig. 2022. Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction. Proceedings of the VLDB Endowment 15, 11 (2022), 2361–2374. https://www.vldb.org/pvldb/vol15/p2361-hilprecht.pdf
  51. TiDB: A Raft-Based HTAP Database. Proceedings of the VLDB Endowment 13, 12 (2020), 3072–3084. https://doi.org/10.14778/3415478.3415535
  52. The MYRIAD Federated Database Prototype. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD ’94). https://doi.org/10.1145/191839.191986
  53. Garlic: A New Flavor of Federated Query Processing for DB2. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD ’02). 524–532.
  54. LlamaTune: Sample-Efficient DBMS Configuration Tuning. Proceedings of the VLDB Endowment 15, 11 (2022), 2953–2965. https://doi.org/10.14778/3551793.3551844
  55. Scheduling Strategies for Efficient ETL Execution. Information Systems 38, 6 (2013), 927–945.
  56. Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP & OLAP Main Memory Database System Based on Virtual Memory Snapshots. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE ’11). 195–206. https://doi.org/10.1109/ICDE.2011.5767867
  57. Extract-Transform-Load for Video Streams. Proceedings of the VLDB Endowment 16, 9 (2023), 2302–2315. https://doi.org/10.14778/3598581.3598600
  58. SageDB: A Learned Database System. In Proceedings of the 9th Biennial Conference on Innovative Data Systems Research (CIDR ’19). http://cidrdb.org/cidr2019/papers/p117-kraska-cidr19.pdf
  59. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD ’18). 489–504. https://doi.org/10.1145/3183713.3196909
  60. Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes. Proceedings of the VLDB Endowment 16, 11 (8 2023), 3293–3301. https://doi.org/10.14778/3611479.3611526
  61. Learning to Optimize Join Queries With Deep Reinforcement Learning. arXiv:1808.03196 [cs.DB] https://arxiv.org/abs/1808.03196
  62. Oracle Database In-Memory: A Dual Format In-Memory Database. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering (ICDE ’15). 1253–1258. https://doi.org/10.1109/ICDE.2015.7113373
  63. How good are query optimizers, really? Proceedings of the VLDB Endowment 9, 3 (2015), 204–215.
  64. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems (NeurIPS ’20).
  65. Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques. Proceedings of the VLDB Endowment 5, 11 (2012), 1555–1566. https://doi.org/10.14778/2350229.2350269
  66. DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient Steps. Proceedings of the ACM on Management of Data 1, 2, Article 117 (2023), 27 pages. https://doi.org/10.1145/3589262
  67. Serverless State Management Systems. In Proceedings of the Conference on Innovative Data Research (CIDR ’24). https://www.cidrdb.org/cidr2024/papers/p16-li.pdf
  68. Database Gyms. In Proceedings of the Conference on Innovative Data Systems Research (CIDR ’23).
  69. B. T. Lowerre. 1976. The HARPY Speech Recognition System. Ph.D. Dissertation. Carnegie Mellon University.
  70. Bao: Making Learned Query Optimization Practical. In Proceedings of the International Conference on Management of Data (SIGMOD ’22).
  71. Neo: A Learned Query Optimizer. Proceedings of the VLDB Endowment 12, 11 (2019).
  72. Ryan Marcus and Olga Papaemmanouil. 2018. Deep Reinforcement Learning for Join Order Enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM ’18).
  73. Ryan Marcus and Olga Papaemmanouil. 2019. Plan-Structured Deep Neural Network Models for Query Performance Prediction. Proceedings of the VLDB Endowment 12, 11 (2019), 1733–1746. https://doi.org/10.14778/3342263.3342646
  74. Microsoft Corporation. 2024a. Azure Compute. https://azure.microsoft.com/en-us/products/category/compute. Retrieved July 20, 2024.
  75. Microsoft Corporation. 2024b. Microsoft Fabric Documentation. https://learn.microsoft.com/en-us/fabric/. Retrieved July 20, 2024.
  76. How Good is My HTAP System?. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD ’22). 1810–1824.
  77. Ray: A Distributed Framework for Emerging AI Applications. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI ’18). 561–577.
  78. Making Data Clouds Smarter at Keebo: Automated Warehouse Optimization Using Data Learning. In Companion of the 2023 International Conference on Management of Data (Seattle, WA, USA) (SIGMOD ’23). Association for Computing Machinery, New York, NY, USA, 239–251. https://doi.org/10.1145/3555041.3589681
  79. Performance and Resource Modeling in Highly-Concurrent OLTP Workloads. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD ’13).
  80. Learning Multi-Dimensional Indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 985–1000. https://doi.org/10.1145/3318464.3380579
  81. Robust Query Driven Cardinality Estimation under Changing Workloads. Proceedings of the VLDB Endowment 16, 6 (2023), 1520–1533. https://doi.org/10.14778/3583140.3583164
  82. Jennifer Ortiz. 2019. Performance-Based Service Level Agreements for Data Analytics in the Cloud. Ph.D. Dissertation. University of Washington.
  83. SLAOrchestrator: Reducing the Cost of Performance SLAs for Cloud Data Analytics. In Proceedings of the 2018 USENIX Annual Technical Conference ((USENIX ATC ’18)). 547–560.
  84. PerfEnforce: A Dynamic Scaling Engine for Analytics with Performance Guarantees. arXiv:1605.09753 [cs.DB]
  85. OtterTune, Inc. 2024. OtterTune — AI Powered Automatic PostgreSQL & MySQL Tuning. https://web.archive.org/web/20240605143522/https://ottertune.com/. Retrieved July 20, 2024.
  86. Self-Driving Database Management Systems. In Proceedings of the Conference on Innovative Data Systems Research (CIDR ’17). https://db.cs.cmu.edu/papers/2017/p42-pavlo-cidr17.pdf
  87. External vs. Internal: An Essay on Machine Learning Agents for Autonomous Database Management Systems. IEEE Data Engineering Bulletin (June 2019), 32–46. http://sites.computer.org/debull/A19june/p32.pdf
  88. Make Your Database System Dream of Electric Sheep: Towards Self-Driving Operation. Proceedings of the VLDB Endowment 14, 12 (2021), 3211–3221. https://doi.org/10.14778/3476311.3476411
  89. Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools. In Proceedings of the ACM on Management of Data, Vol. 1. Issue 4. https://doi.org/10.1145/3626720
  90. pgvector Authors. 2023. Open source vector similarity search for Postgres. https://github.com/pgvector/pgvector.
  91. Maksim Podkorytov and Michael Gubanov. 2019. Hybrid.Poly: A Consolidated Interactive Analytical Polystore System. In Proceedings of the 2019 IEEE 35th International Conference on Data Engineering (ICDE ’19). 1996–1999. https://doi.org/10.1109/ICDE.2019.00223
  92. Calton Pu. 1988. Superdatabases for Composition of Heterogeneous Databases. In Proceedings of the Fourth International Conference on Data Engineering (ICDE ’88). 548–555.
  93. INFaaS: Automated Model-less Inference Serving. In Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC ’21). 397–411. https://www.usenix.org/conference/atc21/presentation/romero
  94. Mingwei Samuel. 2021. Hydroflow: A Model and Runtime for Distributed Systems Programming. Master’s thesis. University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-201.html
  95. Auto-WLM: Machine Learning Enhanced Workload Management in Amazon Redshift. In Companion of the 2023 International Conference on Management of Data (SIGMOD ’23). 225–237. https://doi.org/10.1145/3555041.3589677
  96. Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD ’79). 23–34. https://doi.org/10.1145/582095.582099
  97. Amit P Sheth and James A Larson. 1990. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys (CSUR) 22, 3 (1990), 183–236.
  98. Efficient Transaction Processing in SAP HANA Database: The End of a Column Store Myth. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD ’12). 731–742. https://doi.org/10.1145/2213836.2213946
  99. Snowflake, Inc. 2024. ETL vs. ELT: Differences and Similarities. https://www.snowflake.com/guides/etl-vs-elt. Retrieved July 20, 2024.
  100. Ji Sun and Guoliang Li. 2019. An End-to-End Learning-based Cost Estimator. Proceedings of the VLDB Endowment 13, 3 (2019), 307–319. https://doi.org/10.14778/3368289.3368296
  101. Transaction Processing Performance Council (TPC). 2024a. TPC-C. https://www.tpc.org/tpcc/. Retrieved July 20, 2024.
  102. Transaction Processing Performance Council (TPC). 2024b. TPC-DS. https://www.tpc.org/tpcds/default5.asp. Retrieved July 20, 2024.
  103. Transaction Processing Performance Council (TPC). 2024c. TPC-H. https://www.tpc.org/tpch/default5.asp. Retrieved July 20, 2024.
  104. University of California, Berkeley. 2024. Sky Computing. https://sky.cs.berkeley.edu/. Retrieved July 20, 2024.
  105. Automatic Database Management System Tuning Through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD ’17). 1009–1024. https://doi.org/10.1145/3035918.3064029
  106. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. In Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’16). 363–378.
  107. Polypheny-DB: Towards a Distributed and Self-Adaptive Polystore. In Proceedings of the 2018 IEEE International Conference on Big Data (IEEE Big Data ’18). 3364–3373.
  108. Building An Elastic Query Engine on Disaggregated Storage. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’20). 449–462. https://www.usenix.org/conference/nsdi20/presentation/vuppalapati
  109. Self-Tuning Query Scheduling for Analytical Workloads. In Proceedings of the International Conference on Management of Data (SIGMOD ’21). 1879–1891. https://doi.org/10.1145/3448016.3457260
  110. The Myria Big Data Management and Analytics System and Cloud Services. In Proceedings of the Conference on Innovative Data Systems Research (CIDR ’17).
  111. Predicting Query Execution Time: Are Optimizer Cost Models Really Unusable?. In Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE ’13). 1081–1092. https://doi.org/10.1109/ICDE.2013.6544899
  112. Stage: Query Execution Time Prediction in Amazon Redshift. In Companion of the 2024 International Conference on Management of Data (SIGMOD ’24). 280–294. https://doi.org/10.1145/3626246.3653391
  113. FactorJoin: A New Cardinality Estimation Framework for Join Queries. Proceedings of the ACM on Management of Data 1, 1, Article 41 (2023), 27 pages. https://doi.org/10.1145/3588721
  114. A Unified Transferable Model for ML-Enhanced DBMS. In Proceedings of the 12th Conference on Innovative Data Systems Research (CIDR ’22). https://www.cidrdb.org/cidr2022/papers/p6-wu.pdf
  115. TreeLine: An Update-In-Place Key-Value Store for Modern Storage. Proceedings of the VLDB Endowment 16, 1 (2022), 99–112.
  116. Reinforcement Learning with Tree-LSTM for Join Order Selection. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE ’20). 1297–1308.
  117. Skeena: Efficient and Consistent Cross-Engine Transactions. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD ’22). 34–48. https://doi.org/10.1145/3514221.3526171
  118. AWESOME: Empowering Scalable Data Science on Social Media Data with an Optimized Tri-Store Data System. arXiv:2112.00833 [cs.DB]
  119. Lero: A Learning-to-Rank Query Optimizer. Proceedings of the VLDB Endowment 16, 6 (2023), 1466–1479. https://doi.org/10.14778/3583140.3583160
Citations (1)

Summary

We haven't generated a summary for this paper yet.