SQLFlow: A Bridge between SQL and Machine Learning (2001.06846v1)
Abstract: Industrial AI systems are mostly end-to-end ML workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques -- supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group.
- Yi Wang (1038 papers)
- Yang Yang (884 papers)
- Weiguo Zhu (3 papers)
- Yi Wu (171 papers)
- Xu Yan (130 papers)
- Yongfeng Liu (2 papers)
- Yu Wang (939 papers)
- Liang Xie (38 papers)
- Ziyao Gao (4 papers)
- Wenjing Zhu (11 papers)
- Xiang Chen (343 papers)
- Wei Yan (127 papers)
- Mingjie Tang (22 papers)
- Yuan Tang (37 papers)