Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SQLFlow: A Bridge between SQL and Machine Learning (2001.06846v1)

Published 19 Jan 2020 in cs.DB and cs.LG

Abstract: Industrial AI systems are mostly end-to-end ML workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques -- supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Yi Wang (1038 papers)
  2. Yang Yang (884 papers)
  3. Weiguo Zhu (3 papers)
  4. Yi Wu (171 papers)
  5. Xu Yan (130 papers)
  6. Yongfeng Liu (2 papers)
  7. Yu Wang (939 papers)
  8. Liang Xie (38 papers)
  9. Ziyao Gao (4 papers)
  10. Wenjing Zhu (11 papers)
  11. Xiang Chen (343 papers)
  12. Wei Yan (127 papers)
  13. Mingjie Tang (22 papers)
  14. Yuan Tang (37 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.