MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases (2410.18406v1)
Abstract: The improvement in translating natural language to structured query language (SQL) can be attributed to the advancements in LLMs. Open-source LLMs, tailored for specific database dialects such as MySQL, have shown great performance. However, cloud service providers are looking for a unified database manager service (e.g., Cosmos DB from Azure, Amazon Aurora from AWS, Lindorm from AlibabaCloud) that can support multiple dialects. This requirement has led to the concept of multi-dialect query generation, which presents challenges to LLMs. These challenges include syntactic differences among dialects and imbalanced data distribution across multiple dialects. To tackle these challenges, we propose MoMQ, a novel Mixture-of-Experts-based multi-dialect query generation framework across both relational and non-relational databases. MoMQ employs a dialect expert group for each dialect and a multi-level routing strategy to handle dialect-specific knowledge, reducing interference during query generation. Additionally, a shared expert group is introduced to address data imbalance, facilitating the transfer of common knowledge from high-resource dialects to low-resource ones. Furthermore, we have developed a high-quality multi-dialect query generation benchmark that covers relational and non-relational databases such as MySQL, PostgreSQL, Cypher for Neo4j, and nGQL for NebulaGraph. Extensive experiments have shown that MoMQ performs effectively and robustly even in resource-imbalanced scenarios.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
- Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023).
- Qwen technical report. arXiv preprint arXiv:2309.16609 (2023).
- Michael Crawshaw. 2020. Multi-Task Learning with Deep Neural Networks: A Survey. arXiv:2009.09796 [cs.LG] https://arxiv.org/abs/2009.09796
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models. arXiv:2401.06066 [cs.CL] https://arxiv.org/abs/2401.06066
- MultiSpider: towards benchmarking multilingual text-to-SQL semantic parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 12745–12753.
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 5547–5569. https://proceedings.mlr.press/v162/du22c.html
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv:2101.03961 [cs.LG] https://arxiv.org/abs/2101.03961
- Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models. arXiv:2403.03432 [cs.CL] https://arxiv.org/abs/2403.03432
- Catsql: Towards real world natural language to sql applications. Proceedings of the VLDB Endowment 16, 6 (2023), 1534–1547.
- Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation. Proc. VLDB Endow. 17, 5 (may 2024), 1132–1145. https://doi.org/10.14778/3641204.3641221
- Chase: A Large-Scale and Pragmatic Chinese Dataset for Cross-Database Context-Dependent Text-to-SQL. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, Online, 2316–2331. https://doi.org/10.18653/v1/2021.acl-long.180
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
- Mixtral of Experts. arXiv:2401.04088 [cs.LG] https://arxiv.org/abs/2401.04088
- Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints. In The Eleventh International Conference on Learning Representations.
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. arXiv:2006.16668 [cs.CL] https://arxiv.org/abs/2006.16668
- MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts. arXiv:2404.15159 [cs.CL] https://arxiv.org/abs/2404.15159
- RESDSQL: decoupling schema linking and skeleton parsing for text-to-SQL. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence (AAAI’23/IAAI’23/EAAI’23). AAAI Press, Article 1466, 9 pages. https://doi.org/10.1609/aaai.v37i11.26535
- CodeS: Towards Building Open-source Language Models for Text-to-SQL. Proc. ACM Manag. Data 2, 3, Article 127 (may 2024), 28 pages. https://doi.org/10.1145/3654930
- Graphix-T5: mixing pre-trained transformers with graph-aware layers for text-to-SQL parsing. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence (AAAI’23/IAAI’23/EAAI’23). AAAI Press, Article 1467, 9 pages. https://doi.org/10.1609/aaai.v37i11.26536
- Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems 36 (2024).
- StarCoder: may the source be with you! arXiv:2305.06161 [cs.CL] https://arxiv.org/abs/2305.06161
- PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistency. arXiv:2403.09732 [cs.CL] https://arxiv.org/abs/2403.09732
- MoE-LLaVA: Mixture of Experts for Large Vision-Language Models. arXiv preprint arXiv:2401.15947 (2024).
- Neo4j. 2012. Neo4j - The World’s Leading Graph Database. http://neo4j.org/
- Mohammadreza Pourreza and Davood Rafiei. 2023. DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. In Thirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=p53QDxSIc5
- Code Llama: Open Foundation Models for Code. arXiv:2308.12950 [cs.CL] https://arxiv.org/abs/2308.12950
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. In International Conference on Learning Representations. https://openreview.net/forum?id=B1ckMDqlg
- Which Tasks Should Be Learned Together in Multi-task Learning?. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 9120–9132. https://proceedings.mlr.press/v119/standley20a.html
- LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL] https://arxiv.org/abs/2302.13971
- Nebula Graph: An open source distributed graph database. arXiv:2206.07278 [cs.DB] https://arxiv.org/abs/2206.07278
- Mixture of LoRA Experts. arXiv:2404.13628 [cs.CL] https://arxiv.org/abs/2404.13628
- OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models. arXiv:2402.01739 [cs.CL] https://arxiv.org/abs/2402.01739
- Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3911–3921.
- Yu Zhang and Qiang Yang. 2022. A Survey on Multi-Task Learning. IEEE Transactions on Knowledge and Data Engineering 34, 12 (2022), 5586–5609. https://doi.org/10.1109/TKDE.2021.3070203
- Sparse MoE with Language Guided Routing for Multilingual Machine Translation. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=ySS7hH1smL
- R3superscript𝑅3R^{3}italic_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT-NL2GQL: A Model Coordination and Knowledge Graph Alignment Approach for NL2GQL. arXiv:2311.01862 [cs.CL] https://arxiv.org/abs/2311.01862
- Barret Zoph. 2022. Designing Effective Sparse Expert Models. In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1044–1044. https://doi.org/10.1109/IPDPSW55747.2022.00171
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.