Automatic Configuration Tuning on Cloud Database: A Survey (2404.06043v1)
Abstract: Faced with the challenges of big data, modern cloud database management systems are designed to efficiently store, organize, and retrieve data, supporting optimal performance, scalability, and reliability for complex data processing and analysis. However, achieving good performance in modern databases is non-trivial as they are notorious for having dozens of configurable knobs, such as hardware setup, software setup, database physical and logical design, etc., that control runtime behaviors and impact database performance. To find the optimal configuration for achieving optimal performance, extensive research has been conducted on automatic parameter tuning in DBMS. This paper provides a comprehensive survey of predominant configuration tuning techniques, including Bayesian optimization-based solutions, Neural network-based solutions, Reinforcement learning-based solutions, and Search-based solutions. Moreover, it investigates the fundamental aspects of parameter tuning pipeline, including tuning objective, workload characterization, feature pruning, knowledge from experience, configuration recommendation, and experimental settings. We highlight technique comparisons in each component, corresponding solutions, and introduce the experimental setting for performance evaluation. Finally, we conclude this paper and present future research opportunities. This paper aims to assist future researchers and practitioners in gaining a better understanding of automatic parameter tuning in cloud databases by providing state-of-the-art existing solutions, research directions, and evaluation benchmarks.
- Y.-L. Choi, W.-S. Jeon, and S.-H. Yoon, “Improving database system performance by applying nosql,” Journal Of Information Processing Systems, vol. 10, no. 3, pp. 355–364, 2014.
- K. Sahatqija, J. Ajdari, X. Zenuni, B. Raufi, and F. Ismaili, “Comparison between relational and nosql databases,” in 2018 41st international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, 2018, pp. 0216–0221.
- X. Zhang, H. Wu, Y. Li, J. Tan, F. Li, and B. Cui, “Towards dynamic and safe configuration tuning for cloud databases,” in Proceedings of the 2022 International Conference on Management of Data, 2022, pp. 631–645.
- Z. Yan, J. Lu, N. Chainani, and C. Lin, “Workload-aware performance tuning for autonomous dbmss,” in 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2021, pp. 2365–2368.
- S. Duan, V. Thummala, and S. Babu, “Tuning database configuration parameters with ituned,” Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 1246–1257, 2009.
- X. Zhang, H. Wu, Z. Chang, S. Jin, J. Tan, F. Li, T. Zhang, and B. Cui, “Restune: Resource oriented tuning boosted by meta-learning for cloud databases,” in Proceedings of the 2021 international conference on management of data, 2021, pp. 2102–2114.
- J. Xin, K. Hwang, and Z. Yu, “Locat: Low-overhead online configuration auto-tuning of spark sql applications,” in Proceedings of the 2022 International Conference on Management of Data, 2022, pp. 674–684.
- D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang, “Automatic database management system tuning through large-scale machine learning,” in Proceedings of the 2017 ACM international conference on management of data, 2017, pp. 1009–1024.
- G. Li, X. Zhou, S. Li, and B. Gao, “Qtune: A query-aware database tuning system with deep reinforcement learning,” Proceedings of the VLDB Endowment, vol. 12, no. 12, pp. 2118–2130, 2019.
- Y. Zhu, J. Liu, M. Guo, Y. Bao, W. Ma, Z. Liu, K. Song, and Y. Yang, “Bestconfig: tapping the performance potential of systems via automatic configuration tuning,” in Proceedings of the 2017 Symposium on Cloud Computing, SantaClara,CA,USA, 2017, pp. 338–350.
- L. Bao, X. Liu, and W. Chen, “Learning-based automatic parameter tuning for big data analytics frameworks,” in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 181–190.
- J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng, J. Xing, Y. Wang, T. Cheng, L. Liu et al., “An end-to-end automatic cloud database tuning system using deep reinforcement learning,” in Proceedings of the 2019 International Conference on Management of Data (ICDM), 2019, pp. 415–432.
- B. Cai, Y. Liu, C. Zhang, G. Zhang, K. Zhou, L. Liu, C. Li, B. Cheng, J. Yang, and J. Xing, “Hunter: an online cloud database hybrid tuning system for personalized requirements,” in Proceedings of the 2022 International Conference on Management of Data, 2022, pp. 646–659.
- D. Van Aken, D. Yang, S. Brillard, A. Fiorino, B. Zhang, C. Bilien, and A. Pavlo, “An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems,” Proceedings of the VLDB Endowment, vol. 14, no. 7, pp. 1241–1253, 2021.
- J. Tan, T. Zhang, F. Li, J. Chen, Q. Zheng, P. Zhang, H. Qiao, Y. Shi, W. Cao, and R. Zhang, “ibtune: Individualized buffer tuning for large-scale cloud databases,” Proceedings of the VLDB Endowment, vol. 12, no. 10, pp. 1221–1234, 2019.
- K. Kanellis, C. Ding, B. Kroth, A. Müller, C. Curino, and S. Venkataraman, “Llamatune: sample-efficient dbms configuration tuning,” arXiv preprint arXiv:2203.05128, 2022.
- S. Cereda, S. Valladares, P. Cremonesi, and S. Doni, “Cgptuner: a contextual gaussian process bandit approach for the automatic tuning of it configurations under varying workload conditions,” Proceedings of the VLDB Endowment, vol. 14, no. 8, pp. 1401–1413, 2021.
- I. Trummer, “Db-bert: a database tuning tool that” reads the manual”,” in Proceedings of the 2022 international conference on management of data, 2022, pp. 190–203.
- M. Kunjir and S. Babu, “Black or white? how to develop an autotuner for memory-based analytics,” in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020, pp. 1667–1683.
- F. Song, K. Zaouk, C. Lyu, A. Sinha, Q. Fan, Y. Diao, and P. Shenoy, “Spark-based cloud data analytics using multi-objective optimization,” in 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2021, pp. 396–407.
- C. Lin, J. Zhuang, J. Feng, H. Li, X. Zhou, and G. Li, “Adaptive code learning for spark configuration tuning,” in 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2022, pp. 1995–2007.
- Y. Gur, D. Yang, F. Stalschus, and B. Reinwald, “Adaptive multi-model reinforcement learning for online database tuning.” in EDBT, 2021, pp. 439–444.
- J.-K. Ge, Y.-F. Chai, and Y.-P. Chai, “Watuning: a workload-aware tuning system with attention-based deep reinforcement learning,” Journal of Computer Science and Technology, vol. 36, no. 4, pp. 741–761, 2021.
- M. 8.0, “Innodb information_schema metrics table,” https://dev.mysql.com/doc/refman/8.0/en/innodb-information-schema-metrics-table.html, 2023.
- J. H. Zar, “Spearman rank correlation,” Encyclopedia of biostatistics, vol. 7, 2005.
- K. Kanellis, R. Alagappan, and S. Venkataraman, “Too many knobs to tune? towards faster database tuning by pre-selecting important knobs,” in 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 20), 2020.
- A. Nayebi, A. Munteanu, and M. Poloczek, “A framework for bayesian optimization in embedded subspaces,” in International Conference on Machine Learning. PMLR, 2019, pp. 4752–4761.
- Z. Cao, G. Kuenning, and E. Zadok, “Carver: Finding important parameters for storage system tuning,” in 18th USENIX Conference on File and Storage Technologies (FAST 20), 2020, pp. 43–57.
- X. Zhang, H. Wu, Y. Li, Z. Tang, J. Tan, F. Li, and B. Cui, “An efficient transfer learning based configuration adviser for database tuning,” Proceedings of the VLDB Endowment, vol. 17, no. 3, pp. 539–552, 2023.
- C. K. J. Hou and K. Behdinan, “Dimensionality reduction in surrogate modeling: A review of combined methods,” Data Science and Engineering, vol. 7, no. 4, pp. 402–427, 2022.
- S. Yang, J. Wen, X. Zhan, and D. Kifer, “Et-lasso: a new efficient tuning of lasso-type regularization for high-dimensional data,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 607–616.
- T. Bai, Y. Li, Y. Shen, X. Zhang, W. Zhang, and B. Cui, “Transfer learning for bayesian optimization: A survey,” arXiv preprint arXiv:2302.05927, 2023.
- F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Sequential model-based optimization for general algorithm configuration,” in Learning and Intelligent Optimization: 5th International Conference, LION 5, Rome, Italy, January 17-21, 2011. Selected Papers 5. Springer, 2011, pp. 507–523.
- M. Seeger, “Gaussian processes for machine learning,” International journal of neural systems, vol. 14, no. 02, pp. 69–106, 2004.
- J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. Patwary, M. Prabhat, and R. Adams, “Scalable bayesian optimization using deep neural networks,” in International conference on machine learning. PMLR, 2015, pp. 2171–2180.
- J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyper-parameter optimization,” Advances in neural information processing systems, vol. 24, 2011.
- D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of expensive black-box functions,” Journal of Global optimization, vol. 13, pp. 455–492, 1998.
- M. Hoffman, E. Brochu, N. De Freitas et al., “Portfolio allocation for bayesian optimization.” in UAI, 2011, pp. 327–336.
- C. E. Rasmussen, “Gaussian processes in machine learning,” in Summer school on machine learning. Springer, 2003, pp. 63–71.
- P. Hennig and C. J. Schuler, “Entropy search for information-efficient global optimization.” Journal of Machine Learning Research, vol. 13, no. 6, 2012.
- J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” Advances in neural information processing systems, vol. 25, 2012.
- N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, “Gaussian process optimization in the bandit setting: No regret and experimental design,” arXiv preprint arXiv:0912.3995, 2009.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
- M. Ester, H.-P. Kriegel, J. Sander, X. Xu et al., “A density-based algorithm for discovering clusters in large spatial databases with noise,” in kdd, vol. 96, no. 34, 1996, pp. 226–231.
- M. Lindauer, K. Eggensperger, M. Feurer, A. Biedenkapp, D. Deng, C. Benjamins, T. Ruhkopf, R. Sass, and F. Hutter, “Smac3: A versatile bayesian optimization package for hyperparameter optimization,” The Journal of Machine Learning Research, vol. 23, no. 1, pp. 2475–2483, 2022.
- Y. Li, Y. Shen, W. Zhang, Y. Chen, H. Jiang, M. Liu, J. Jiang, J. Gao, W. Wu, Z. Yang et al., “Openbox: A generalized black-box optimization service,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 3209–3219.
- A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah et al., “Self-driving database management systems.” in CIDR, vol. 4, 2017, p. 1.
- S. Cereda, G. Palermo, P. Cremonesi, and S. Doni, “A collaborative filtering approach for the automatic tuning of compiler optimisations,” in The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 2020, pp. 15–25.
- N. Schilling, M. Wistuba, and L. Schmidt-Thieme, “Scalable hyperparameter optimization with products of gaussian process experts,” in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16. Springer, 2016, pp. 33–48.
- M. Wistuba, N. Schilling, and L. Schmidt-Thieme, “Two-stage transfer surrogate model for automatic hyperparameter optimization,” in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16. Springer, 2016, pp. 199–214.
- M. Wistuba, N. Schilling, and L. Schmidt Thieme, “Scalable gaussian process-based transfer surrogates for hyperparameter optimization,” Machine Learning, vol. 107, no. 1, pp. 43–78, 2018.
- M. Feurer, B. Letham, and E. Bakshy, “Scalable meta-learning for bayesian optimization using ranking-weighted gaussian process ensembles,” in AutoML Workshop at ICML, vol. 7, 2018, p. 5.
- Y. Li, Y. Shen, H. Jiang, W. Zhang, Z. Yang, C. Zhang, and B. Cui, “Transbo: Hyperparameter optimization via two-phase transfer learning,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 956–966.
- D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro, and D. Sculley, “Google vizier: A service for black-box optimization,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 1487–1495.
- S. Gelly and D. Silver, “Combining online and offline knowledge in uct,” in Proceedings of the 24th international conference on Machine learning, 2007, pp. 273–280.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
- M. P. Deisenroth, C. E. Rasmussen, and D. Fox, “Learning to control a low-cost manipulator using data-efficient reinforcement learning,” Robotics: Science and Systems VII, vol. 7, pp. 57–64, 2011.
- J. Peters and S. Schaal, “Learning to control in operational space,” The International Journal of Robotics Research, vol. 27, no. 2, pp. 197–212, 2008.
- X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “End-to-end task-completion neural dialogue systems,” arXiv preprint arXiv:1703.01008, 2017.
- M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. Viégas, M. Wattenberg, G. Corrado et al., “Google’s multilingual neural machine translation system: Enabling zero-shot translation,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 339–351, 2017.
- D. Bahdanau, P. Brakel, K. Xu, A. Goyal, R. Lowe, J. Pineau, A. Courville, and Y. Bengio, “An actor-critic algorithm for sequence prediction,” arXiv preprint arXiv:1607.07086, 2016.
- Y. Ling, S. A. Hasan, V. Datla, A. Qadir, K. Lee, J. Liu, and O. Farri, “Diagnostic inferencing via improving clinical concept extraction with deep reinforcement learning: A preliminary study,” in Machine Learning for Healthcare Conference. PMLR, 2017, pp. 271–285.
- C. Yu, J. Liu, S. Nemati, and G. Yin, “Reinforcement learning in healthcare: A survey,” ACM Computing Surveys (CSUR), vol. 55, no. 1, pp. 1–36, 2021.
- M. W. Brandt, A. Goyal, P. Santa-Clara, and J. R. Stroud, “A simulation approach to dynamic portfolio choice with an application to learning about return predictability,” The Review of Financial Studies, vol. 18, no. 3, pp. 831–873, 2005.
- J. Moody and M. Saffell, “Learning to trade via direct reinforcement,” IEEE transactions on neural Networks, vol. 12, no. 4, pp. 875–889, 2001.
- G. Theocharous, P. S. Thomas, and M. Ghavamzadeh, “Ad recommendation systems for life-time value optimization,” in Proceedings of the 24th international conference on world wide web, 2015, pp. 1305–1310.
- B. Rolf, I. Jackson, M. Müller, S. Lang, T. Reggelin, and D. Ivanov, “A review on reinforcement learning algorithms and applications in supply chain management,” International Journal of Production Research, vol. 61, no. 20, pp. 7151–7179, 2023.
- A. Haydari and Y. Yılmaz, “Deep reinforcement learning for intelligent transportation systems: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 1, pp. 11–32, 2020.
- T. Qian, C. Shao, X. Wang, and M. Shahidehpour, “Deep reinforcement learning for ev charging navigation by coordinating smart grid and intelligent transportation system,” IEEE transactions on smart grid, vol. 11, no. 2, pp. 1714–1723, 2019.
- C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, pp. 279–292, 1992.
- K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, 2017.
- D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in International conference on machine learning. Pmlr, 2014, pp. 387–395.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
- M. Stonebraker and A. Pavlo, “The seats airline ticketing systems benchmark,” 2012.
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, “Benchmarking cloud serving systems with ycsb,” in Proceedings of the 1st ACM symposium on Cloud computing, 2010, pp. 143–154.
- D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux, “Oltp-bench: An extensible testbed for benchmarking relational databases,” Proceedings of the VLDB Endowment, vol. 7, no. 4, pp. 277–288, 2013.
- V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann, “How good are query optimizers, really?” Proceedings of the VLDB Endowment, vol. 9, no. 3, pp. 204–215, 2015.
- T. Chiba, T. Yoshimura, M. Horie, and H. Horii, “Towards selecting best combination of sql-on-hadoop systems and jvms,” in 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). IEEE, 2018, pp. 245–252.
- T. Ivanov and M.-G. Beer, “Evaluating hive and spark sql with bigbench,” arXiv preprint arXiv:1512.08417, 2015.
- Y. Ramdane, O. Boussaid, N. Kabachi, and F. Bentayeb, “Partitioning and bucketing techniques to speed up query processing in spark-sql,” in 2018 IEEE 24th international conference on parallel and distributed systems (ICPADS). IEEE, 2018, pp. 142–151.
- S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, “The hibench benchmark suite: Characterization of the mapreduce-based data analysis,” in 2010 IEEE 26th International conference on data engineering workshops (ICDEW 2010). IEEE, 2010, pp. 41–51.
- X. Zhang, Z. Chang, Y. Li, H. Wu, J. Tan, F. Li, and B. Cui, “Facilitating database tuning with hyper-parameter optimization: a comprehensive experimental evaluation,” Proceedings of the VLDB Endowment, vol. 15, no. 9, pp. 1808–1821, 2022.
- X. Zhao, X. Zhou, and G. Li, “Automatic database knob tuning: A survey,” IEEE Transactions on Knowledge and Data Engineering, 2023.
- M. Nomura and Y. Saito, “Efficient hyperparameter optimization under multi-source covariate shift,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 1376–1385.
- Y. Liu, X. Wang, X. Xu, J. Yang, and W. Zhu, “Meta hyperparameter optimization with adversarial proxy subsets sampling,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 1109–1118.
- Y. Li, Y. Shen, H. Jiang, T. Bai, W. Zhang, C. Zhang, and B. Cui, “Transfer learning based search space design for hyperparameter tuning,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 967–977.
- D. Agrawal, A. El Abbadi, S. Das, and A. J. Elmore, “Database scalability, elasticity, and autonomy in the cloud,” in International conference on database systems for advanced applications. Springer, 2011, pp. 2–15.
- S. Loesing, M. Pilman, T. Etter, and D. Kossmann, “On the design and scalability of distributed shared-data databases,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015, pp. 663–676.
- N. R. Herbst, S. Kounev, and R. Reussner, “Elasticity in cloud computing: What it is, and what it is not,” in 10th international conference on autonomic computing (ICAC 13), 2013, pp. 23–27.
- A. Papaioannou and K. Magoutis, “Incremental elasticity for nosql data stores,” in 2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS). IEEE, 2017, pp. 174–183.
- D. Seybold, S. Volpert, S. Wesner, A. Bauer, N. Herbst, and J. Domaschka, “Kaa: Evaluating elasticity of cloud-hosted dbms,” in 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2019, pp. 54–61.
- Limeng Zhang (6 papers)
- M. Ali Babar (71 papers)