Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 79 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 186 tok/s Pro
GPT OSS 120B 445 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

MoMQ: Mixture-of-Experts Enhances Multi-Dialect Query Generation across Relational and Non-Relational Databases (2410.18406v1)

Published 24 Oct 2024 in cs.CL, cs.AI, cs.DB, and cs.LG

Abstract: The improvement in translating natural language to structured query language (SQL) can be attributed to the advancements in LLMs. Open-source LLMs, tailored for specific database dialects such as MySQL, have shown great performance. However, cloud service providers are looking for a unified database manager service (e.g., Cosmos DB from Azure, Amazon Aurora from AWS, Lindorm from AlibabaCloud) that can support multiple dialects. This requirement has led to the concept of multi-dialect query generation, which presents challenges to LLMs. These challenges include syntactic differences among dialects and imbalanced data distribution across multiple dialects. To tackle these challenges, we propose MoMQ, a novel Mixture-of-Experts-based multi-dialect query generation framework across both relational and non-relational databases. MoMQ employs a dialect expert group for each dialect and a multi-level routing strategy to handle dialect-specific knowledge, reducing interference during query generation. Additionally, a shared expert group is introduced to address data imbalance, facilitating the transfer of common knowledge from high-resource dialects to low-resource ones. Furthermore, we have developed a high-quality multi-dialect query generation benchmark that covers relational and non-relational databases such as MySQL, PostgreSQL, Cypher for Neo4j, and nGQL for NebulaGraph. Extensive experiments have shown that MoMQ performs effectively and robustly even in resource-imbalanced scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
  2. Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023).
  3. Qwen technical report. arXiv preprint arXiv:2309.16609 (2023).
  4. Michael Crawshaw. 2020. Multi-Task Learning with Deep Neural Networks: A Survey. arXiv:2009.09796 [cs.LG] https://arxiv.org/abs/2009.09796
  5. DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models. arXiv:2401.06066 [cs.CL] https://arxiv.org/abs/2401.06066
  6. MultiSpider: towards benchmarking multilingual text-to-SQL semantic parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 12745–12753.
  7. GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 5547–5569. https://proceedings.mlr.press/v162/du22c.html
  8. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv:2101.03961 [cs.LG] https://arxiv.org/abs/2101.03961
  9. Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models. arXiv:2403.03432 [cs.CL] https://arxiv.org/abs/2403.03432
  10. Catsql: Towards real world natural language to sql applications. Proceedings of the VLDB Endowment 16, 6 (2023), 1534–1547.
  11. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation. Proc. VLDB Endow. 17, 5 (may 2024), 1132–1145. https://doi.org/10.14778/3641204.3641221
  12. Chase: A Large-Scale and Pragmatic Chinese Dataset for Cross-Database Context-Dependent Text-to-SQL. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, Online, 2316–2331. https://doi.org/10.18653/v1/2021.acl-long.180
  13. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  14. Mixtral of Experts. arXiv:2401.04088 [cs.LG] https://arxiv.org/abs/2401.04088
  15. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints. In The Eleventh International Conference on Learning Representations.
  16. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. arXiv:2006.16668 [cs.CL] https://arxiv.org/abs/2006.16668
  17. MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts. arXiv:2404.15159 [cs.CL] https://arxiv.org/abs/2404.15159
  18. RESDSQL: decoupling schema linking and skeleton parsing for text-to-SQL. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence (AAAI’23/IAAI’23/EAAI’23). AAAI Press, Article 1466, 9 pages. https://doi.org/10.1609/aaai.v37i11.26535
  19. CodeS: Towards Building Open-source Language Models for Text-to-SQL. Proc. ACM Manag. Data 2, 3, Article 127 (may 2024), 28 pages. https://doi.org/10.1145/3654930
  20. Graphix-T5: mixing pre-trained transformers with graph-aware layers for text-to-SQL parsing. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence (AAAI’23/IAAI’23/EAAI’23). AAAI Press, Article 1467, 9 pages. https://doi.org/10.1609/aaai.v37i11.26536
  21. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems 36 (2024).
  22. StarCoder: may the source be with you! arXiv:2305.06161 [cs.CL] https://arxiv.org/abs/2305.06161
  23. PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistency. arXiv:2403.09732 [cs.CL] https://arxiv.org/abs/2403.09732
  24. MoE-LLaVA: Mixture of Experts for Large Vision-Language Models. arXiv preprint arXiv:2401.15947 (2024).
  25. Neo4j. 2012. Neo4j - The World’s Leading Graph Database. http://neo4j.org/
  26. Mohammadreza Pourreza and Davood Rafiei. 2023. DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. In Thirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=p53QDxSIc5
  27. Code Llama: Open Foundation Models for Code. arXiv:2308.12950 [cs.CL] https://arxiv.org/abs/2308.12950
  28. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. In International Conference on Learning Representations. https://openreview.net/forum?id=B1ckMDqlg
  29. Which Tasks Should Be Learned Together in Multi-task Learning?. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 9120–9132. https://proceedings.mlr.press/v119/standley20a.html
  30. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL] https://arxiv.org/abs/2302.13971
  31. Nebula Graph: An open source distributed graph database. arXiv:2206.07278 [cs.DB] https://arxiv.org/abs/2206.07278
  32. Mixture of LoRA Experts. arXiv:2404.13628 [cs.CL] https://arxiv.org/abs/2404.13628
  33. OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models. arXiv:2402.01739 [cs.CL] https://arxiv.org/abs/2402.01739
  34. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3911–3921.
  35. Yu Zhang and Qiang Yang. 2022. A Survey on Multi-Task Learning. IEEE Transactions on Knowledge and Data Engineering 34, 12 (2022), 5586–5609. https://doi.org/10.1109/TKDE.2021.3070203
  36. Sparse MoE with Language Guided Routing for Multilingual Machine Translation. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=ySS7hH1smL
  37. R3superscript𝑅3R^{3}italic_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT-NL2GQL: A Model Coordination and Knowledge Graph Alignment Approach for NL2GQL. arXiv:2311.01862 [cs.CL] https://arxiv.org/abs/2311.01862
  38. Barret Zoph. 2022. Designing Effective Sparse Expert Models. In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1044–1044. https://doi.org/10.1109/IPDPSW55747.2022.00171
Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.