Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines (2405.11191v1)
Abstract: Machine learning inference pipelines commonly encountered in data science and industries often require real-time responsiveness due to their user-facing nature. However, meeting this requirement becomes particularly challenging when certain input features require aggregating a large volume of data online. Recent literature on interpretable machine learning reveals that most machine learning models exhibit a notable degree of resilience to variations in input. This suggests that machine learning models can effectively accommodate approximate input features with minimal discernible impact on accuracy. In this paper, we introduce Biathlon, a novel ML serving system that leverages the inherent resilience of models and determines the optimal degree of approximation for each aggregation feature. This approach enables maximum speedup while ensuring a guaranteed bound on accuracy loss. We evaluate Biathlon on real pipelines from both industry applications and data science competitions, demonstrating its ability to meet real-time latency requirements by achieving 5.3x to 16.6x speedup with almost no accuracy loss.
- 4paradigm. 2023. TalkingData AdTracking Fraud Detection. https://github.com/4paradigm/OpenMLDB/tree/main/demo/talkingdata-adtracking-fraud-detection
- LASER: a scalable response prediction platform for online advertising. In Proceedings of the 7th ACM international conference on Web search and data mining. https://doi.org/10.1145/2556195.2556252
- Knowing when you’re wrong: building fast and reliable approximate query processing systems. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD ’14). Association for Computing Machinery, New York, NY, USA, 481–492. https://doi.org/10.1145/2588555.2593667
- BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data. arXiv: Databases,arXiv: Databases (Mar 2012).
- Machine Learning at Microsoft with ML.NET. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/3292500.3330667
- Alibaba. 2023. FeatHub - A stream-batch unified feature store for real-time machine learning. https://github.com/alibaba/feathub
- Alibaba FeatHub. 2023. Fraud Detection. https://github.com/alibaba/feathub/blob/master/docs/examples/fraud_detection.ipynb
- Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (Oct. 2001), 5–32. https://doi.org/10.1023/A:1010933404324
- Russel E. Caflisch. 1998. Monte Carlo and Quasi-Monte Carlo Methods. Acta Numerica 7 (Jan. 1998), 1–49. https://doi.org/10.1017/S0962492900002804
- Optimizing In-Memory Database Engine for AI-Powered on-Line Decision Augmentation Using Persistent Memory. Proc. VLDB Endow. 14, 5 (jan 2021), 799–812. https://doi.org/10.14778/3446095.3446102
- TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. Cornell University - arXiv,Cornell University - arXiv (Feb 2018).
- Interpretable Machine Learning. Queue (Dec 2021), 28–56. https://doi.org/10.1145/3511299
- ClickHouse. 2023. Fast Open-Source OLAP DBMS - ClickHouse. https://clickhouse.com/
- Clipper: A Low-Latency Online Prediction Serving System. arXiv: Distributed, Parallel, and Cluster Computing,arXiv: Distributed, Parallel, and Cluster Computing (Dec 2016).
- Dasmehdixtr. 2020. Binary Classification of Induction Motor Fault — Kaggle. https://www.kaggle.com/code/dasmehdixtr/binary-classification-of-induction-motor-fault
- Databricks Inc. 2023a. The Data and AI Company - Databricks. https://www.databricks.com/
- Databricks Inc. 2023b. feature-store-taxi-example - Databricks. https://docs.gcp.databricks.com/_extras/notebooks/source/machine-learning/feature-store-taxi-example.html
- etc David Gagnon, Maggie. 2023. Predict Student Performance from Game Play. https://kaggle.com/competitions/predict-student-performance-from-game-play
- Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee. In Proceedings of the 2016 International Conference on Management of Data. https://doi.org/10.1145/2882903.2915249
- Turbo-Charging Estimate Convergence in DBO. Proceedings of the VLDB Endowment 2, 1 (Aug. 2009), 419–430. https://doi.org/10.14778/1687627.1687675
- AccMPEG: Optimizing Video Encoding for Video Analytics. ArXiv abs/2204.12534 (2022). https://api.semanticscholar.org/CorpusID:248405608
- Bradley. Efron and Robert. Tibshirani. 1993. An introduction to the bootstrap. Chapman and Hall, New York.
- Feast. 2023. Feature Store for Machine Learning. https://github.com/feast-dev/feast
- feather-ai. 2023. Feathr – A scalable, unified data and AI engineering platform for enterprise. https://github.com/feathr-ai/feathr
- HyperLogLog: The Analysis of a near-Optimal Cardinality Estimation Algorithm. Discrete Mathematics & Theoretical Computer Science DMTCS Proceedings vol. AH,…, Proceedings (Jan. 2007), 3545. https://doi.org/10.46298/dmtcs.3545
- Google Cloud. 2023. Innovate faster with enterprise-ready generative AI. https://cloud.google.com/vertex-ai
- Feature Importance Guided Attack: A Model Agnostic Adversarial Attack. https://doi.org/10.48550/arXiv.2106.14815 arXiv:2106.14815 [cs]
- Why do tree-based models still outperform deep learning on tabular data? arXiv:2207.08815 [cs.LG]
- Gusthema. 2023a. Student Performance w/ Random Forests. https://www.kaggle.com/code/cdeotte/random-forest-baseline-0-664/notebook
- Gusthema. 2023b. Student Performance w/ TensorFlow Decision Forests. https://www.kaggle.com/code/gusthema/student-performance-w-tensorflow-decision-forests
- Query Processing on Tensor Computation Runtimes. Proc. VLDB Endow. 15, 11 (jul 2022), 2811–2825. https://doi.org/10.14778/3551793.3551833
- Online Aggregation. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD ’97). Association for Computing Machinery, New York, NY, USA, 171–182. https://doi.org/10.1145/253260.253291
- John L. Hennessy and David A. Patterson. [n. d.]. Computer Architecture, Fifth Edition: A Quantitative Approach (5 ed.). Morgan Kaufmann Publishers Inc.
- DeepDB: Learn from Data, not from Queries! arXiv: Databases,arXiv: Databases (Sep 2019).
- Hopsworks. 2023. Hopsworks - Batch and Real-time ML Platform. https://www.hopsworks.ai
- JoinBoost: Grow Trees Over Normalized Data Using Only SQL. (Jul 2023).
- JS. 2023. Forex tick data huge database since april 2020 — Kaggle. https://www.kaggle.com/datasets/joseserrat/forex-april-2020-to-june-2021-tick-data
- Kaggle. 2023. Kaggle: Your Machine Learning and Data Science Community. https://www.kaggle.com/
- Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters. In Proceedings of the 2016 International Conference on Management of Data. https://doi.org/10.1145/2882903.2882940
- NoScope: Optimizing Neural Network Queries over Video at Scale. Proc. VLDB Endow. 10, 11 (aug 2017), 1586–1597. https://doi.org/10.14778/3137628.3137664
- Extending Relational Query Processing with ML Inference. Conference on Innovative Data Systems Research,Conference on Innovative Data Systems Research (Jan 2019).
- Kengle. 2023a. Predict Charge Time of Battery. https://www.kaggle.com/code/kenggle/nasa-battery-life-prediction-dataset-cleaning
- Kengle. 2023b. RUL Prediction of Turbofan Engine. https://www.kaggle.com/kenggle/rul-prediction-with-lgbm-on-turbofan-dataset
- Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference. Cornell University - arXiv,Cornell University - arXiv (Jun 2019).
- Everest: A Top-K Deep Video Analytics System. In Proceedings of the 2022 International Conference on Management of Data (New York, NY, USA, 2022-06-11) (SIGMOD ’22). Association for Computing Machinery, 2357–2360.
- S. H. Lee and W. Chen. 2009. A Comparative Study of Uncertainty Propagation Methods for Black-Box-Type Problems. Structural and Multidisciplinary Optimization 37, 3 (Jan. 2009), 239–253. https://doi.org/10.1007/s00158-008-0234-7
- Pretzel: opening the black box of machine learning prediction serving systems. Operating Systems Design and Implementation,Operating Systems Design and Implementation (Oct 2018).
- Wander Join: Online Aggregation via Random Walks. In Proceedings of the 2016 International Conference on Management of Data. https://doi.org/10.1145/2882903.2915235
- Kaiyu Li and Guoliang Li. 2018. Approximate Query Processing: What is New and Where to Go?: A Survey on Approximate Query Processing. Data Science and Engineering (Dec 2018), 379–397. https://doi.org/10.1007/s41019-018-0074-4
- Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.
- Qingzhi Ma and Peter Triantafillou. 2019. DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models. In Proceedings of the 2019 International Conference on Management of Data. https://doi.org/10.1145/3299869.3324958
- Christoph Molnar. 2023. Interpretable Machine Learning. Online. https://christophm.github.io/interpretable-ml-book/
- Barzan Mozafari and Ning Niu. 2015. A Handbook for Building an Approximate Query Engine. IEEE Data Eng. Bull. (2015).
- A tensor compiler for unified machine learning prediction serving. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 899–917.
- Nikhil Kohil. 2020. Stock Prediction using Linear Regression - Starter — Kaggle. https://www.kaggle.com/code/nikhilkohli/stock-prediction-using-linear-regression-starter
- Evaluating end-to-end optimization for data analytics applications in weld. Proceedings of the VLDB Endowment (May 2018), 1002–1015. https://doi.org/10.14778/3213880.3213890
- End-to-end Optimization of Machine Learning Prediction Queries. In Proceedings of the 2022 International Conference on Management of Data. https://doi.org/10.1145/3514221.3526141
- VerdictDB: Universalizing Approximate Query Processing. In Proceedings of the 2018 International Conference on Management of Data. https://doi.org/10.1145/3183713.3196905
- Database Learning: Toward a Database that Becomes Smarter Every Time. In Proceedings of the 2017 ACM International Conference on Management of Data. https://doi.org/10.1145/3035918.3064013
- Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
- A. Pol and C. Jermaine. 2005. Relational confidence bounds are easy with the bootstrap. Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (2005). https://doi.org/10.1145/1066157.1066224
- Data Science through the looking glass and what we found there. arXiv: Learning,arXiv: Learning (Dec 2019).
- ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco California USA, 1135–1144. https://doi.org/10.1145/2939672.2939778
- Anchors: High-Precision Model-Agnostic Explanations. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (April 2018). https://doi.org/10.1609/aaai.v32i1.11491
- Mathieu Rouaud. [n. d.]. Probability, Statistics and Estimation: Propagation of Uncertainties in Experimental Measurement. https://www.lulu.com/shop/mathieu-rouaud/probability-statistics-and-estimation-propagation-of-uncertainties-in-experimental-measurement/paperback/product-1kwvyjky.html.
- Issac Sacolick. 2020. Amazon, Google, and Microsoft take their clouds to the edge. (2020). https://www.infoworld.com/article/3575071/amazon-google-and-microsoft-take-their-clouds-to-the-edge.html
- B. Saha and K. Goebel. 2007. Battery Data Set. https://data.nasa.gov/dataset/Li-ion-Battery-Aging-Datasets/uj5r-zjdb/about_data
- Andrea Saltelli. 2002. Making Best Use of Model Evaluations to Compute Sensitivity Indices. Computer Physics Communications 145, 2 (May 2002), 280–297. https://doi.org/10.1016/S0010-4655(02)00280-1
- Containerized Execution of UDFs: An Experimental Evaluation. Proc. VLDB Endow. 15, 11 (jul 2022), 3158–3171. https://doi.org/10.14778/3551793.3551860
- Damage propagation modeling for aircraft engine run-to-failure simulation. In 2008 International Conference on Prognostics and Health Management. 1–9. https://doi.org/10.1109/PHM.2008.4711414
- Lloyd S. Shapley. 1952. A Value for N-Person Games. Technical Report. RAND Corporation.
- A Step Toward Deep Online Aggregation. Proceedings of the ACM on Management of Data 1, 2 (June 2023), 124:1–124:28. https://doi.org/10.1145/3589269
- Ilya M. Sobol. 1967. On the distribution of points in a cube and the approximate evaluation of integrals. Ussr Computational Mathematics and Mathematical Physics 7 (1967), 86–112. https://api.semanticscholar.org/CorpusID:122581245
- I. M Sobol′. 2001. Global Sensitivity Indices for Nonlinear Mathematical Models and Their Monte Carlo Estimates. Mathematics and Computers in Simulation 55, 1 (Feb. 2001), 271–280. https://doi.org/10.1016/S0378-4754(00)00270-6
- TalkingData. 2023. TalkingData AdTracking Fraud Detection Challenge — Kaggle. https://www.kaggle.com/competitions/talkingdata-adtracking-fraud-detection
- NYC Taxi and Limousine Commission. 2023. TLC Trip Record Data - TLC. https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
- Tecton. 2023. Real-Time Fraud Detection in the Databricks Lakehouse with Tecton. https://github.com/tecton-ai/blog-sample-code/blob/main/databricks/fraud-detection/Real-Time_Fraud_Detection_in_the_Databricks_Lakehouse_with_Tecton.ipynb
- Serkan Uysal. 2023. Machinery Fault Dataset. https://www.kaggle.com/datasets/uysalserkan/fault-induction-motor-dataset
- Vivek Khetan. 2016. A linear model on apps and labels. https://www.kaggle.com/code/vkhetan/a-linear-model-on-apps-and-labels
- Rafiki: machine learning as an analytics service system. Proceedings of the VLDB Endowment 12, 2 (Oct 2018), 128–140. https://doi.org/10.14778/3282495.3282499
- Feature Importance-aware Transferable Adversarial Attacks. https://doi.org/10.48550/arXiv.2107.14185 arXiv:2107.14185 [cs]
- Richard Wesley and Fei Xu. 2016. Incremental Computation of Common Windowed Holistic Aggregates. Proceedings of the VLDB Endowment 9, 12 (Aug. 2016), 1221–1232. https://doi.org/10.14778/2994509.2994537
- RALF: Accuracy-Aware Scheduling for Feature Store Maintenance. 17, 3 ([n. d.]), 563–576.
- Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures. Proc. VLDB Endow. 16 (2022), 406–419. https://api.semanticscholar.org/CorpusID:248665909
- Decoding Speculative Decoding. arXiv:2402.01528 [cs.LG]
- G-OLA: Generalized On-Line Aggregation for Interactive Analysis on Big Data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/2723372.2735381
- FEBench: A Benchmark for Real-Time Relational Data Feature Extraction. Proc. VLDB Endow. 16, 12 (aug 2023), 3597–3609. https://doi.org/10.14778/3611540.3611550