Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines (2405.11191v1)

Published 18 May 2024 in cs.DB and cs.LG

Abstract: Machine learning inference pipelines commonly encountered in data science and industries often require real-time responsiveness due to their user-facing nature. However, meeting this requirement becomes particularly challenging when certain input features require aggregating a large volume of data online. Recent literature on interpretable machine learning reveals that most machine learning models exhibit a notable degree of resilience to variations in input. This suggests that machine learning models can effectively accommodate approximate input features with minimal discernible impact on accuracy. In this paper, we introduce Biathlon, a novel ML serving system that leverages the inherent resilience of models and determines the optimal degree of approximation for each aggregation feature. This approach enables maximum speedup while ensuring a guaranteed bound on accuracy loss. We evaluate Biathlon on real pipelines from both industry applications and data science competitions, demonstrating its ability to meet real-time latency requirements by achieving 5.3x to 16.6x speedup with almost no accuracy loss.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (87)
  1. 4paradigm. 2023. TalkingData AdTracking Fraud Detection. https://github.com/4paradigm/OpenMLDB/tree/main/demo/talkingdata-adtracking-fraud-detection
  2. LASER: a scalable response prediction platform for online advertising. In Proceedings of the 7th ACM international conference on Web search and data mining. https://doi.org/10.1145/2556195.2556252
  3. Knowing when you’re wrong: building fast and reliable approximate query processing systems. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD ’14). Association for Computing Machinery, New York, NY, USA, 481–492. https://doi.org/10.1145/2588555.2593667
  4. BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data. arXiv: Databases,arXiv: Databases (Mar 2012).
  5. Machine Learning at Microsoft with ML.NET. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/3292500.3330667
  6. Alibaba. 2023. FeatHub - A stream-batch unified feature store for real-time machine learning. https://github.com/alibaba/feathub
  7. Alibaba FeatHub. 2023. Fraud Detection. https://github.com/alibaba/feathub/blob/master/docs/examples/fraud_detection.ipynb
  8. Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (Oct. 2001), 5–32. https://doi.org/10.1023/A:1010933404324
  9. Russel E. Caflisch. 1998. Monte Carlo and Quasi-Monte Carlo Methods. Acta Numerica 7 (Jan. 1998), 1–49. https://doi.org/10.1017/S0962492900002804
  10. Optimizing In-Memory Database Engine for AI-Powered on-Line Decision Augmentation Using Persistent Memory. Proc. VLDB Endow. 14, 5 (jan 2021), 799–812. https://doi.org/10.14778/3446095.3446102
  11. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. Cornell University - arXiv,Cornell University - arXiv (Feb 2018).
  12. Interpretable Machine Learning. Queue (Dec 2021), 28–56. https://doi.org/10.1145/3511299
  13. ClickHouse. 2023. Fast Open-Source OLAP DBMS - ClickHouse. https://clickhouse.com/
  14. Clipper: A Low-Latency Online Prediction Serving System. arXiv: Distributed, Parallel, and Cluster Computing,arXiv: Distributed, Parallel, and Cluster Computing (Dec 2016).
  15. Dasmehdixtr. 2020. Binary Classification of Induction Motor Fault — Kaggle. https://www.kaggle.com/code/dasmehdixtr/binary-classification-of-induction-motor-fault
  16. Databricks Inc. 2023a. The Data and AI Company - Databricks. https://www.databricks.com/
  17. Databricks Inc. 2023b. feature-store-taxi-example - Databricks. https://docs.gcp.databricks.com/_extras/notebooks/source/machine-learning/feature-store-taxi-example.html
  18. etc David Gagnon, Maggie. 2023. Predict Student Performance from Game Play. https://kaggle.com/competitions/predict-student-performance-from-game-play
  19. Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee. In Proceedings of the 2016 International Conference on Management of Data. https://doi.org/10.1145/2882903.2915249
  20. Turbo-Charging Estimate Convergence in DBO. Proceedings of the VLDB Endowment 2, 1 (Aug. 2009), 419–430. https://doi.org/10.14778/1687627.1687675
  21. AccMPEG: Optimizing Video Encoding for Video Analytics. ArXiv abs/2204.12534 (2022). https://api.semanticscholar.org/CorpusID:248405608
  22. Bradley. Efron and Robert. Tibshirani. 1993. An introduction to the bootstrap. Chapman and Hall, New York.
  23. Feast. 2023. Feature Store for Machine Learning. https://github.com/feast-dev/feast
  24. feather-ai. 2023. Feathr – A scalable, unified data and AI engineering platform for enterprise. https://github.com/feathr-ai/feathr
  25. HyperLogLog: The Analysis of a near-Optimal Cardinality Estimation Algorithm. Discrete Mathematics & Theoretical Computer Science DMTCS Proceedings vol. AH,…, Proceedings (Jan. 2007), 3545. https://doi.org/10.46298/dmtcs.3545
  26. Google Cloud. 2023. Innovate faster with enterprise-ready generative AI. https://cloud.google.com/vertex-ai
  27. Feature Importance Guided Attack: A Model Agnostic Adversarial Attack. https://doi.org/10.48550/arXiv.2106.14815 arXiv:2106.14815 [cs]
  28. Why do tree-based models still outperform deep learning on tabular data? arXiv:2207.08815 [cs.LG]
  29. Gusthema. 2023a. Student Performance w/ Random Forests. https://www.kaggle.com/code/cdeotte/random-forest-baseline-0-664/notebook
  30. Gusthema. 2023b. Student Performance w/ TensorFlow Decision Forests. https://www.kaggle.com/code/gusthema/student-performance-w-tensorflow-decision-forests
  31. Query Processing on Tensor Computation Runtimes. Proc. VLDB Endow. 15, 11 (jul 2022), 2811–2825. https://doi.org/10.14778/3551793.3551833
  32. Online Aggregation. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD ’97). Association for Computing Machinery, New York, NY, USA, 171–182. https://doi.org/10.1145/253260.253291
  33. John L. Hennessy and David A. Patterson. [n. d.]. Computer Architecture, Fifth Edition: A Quantitative Approach (5 ed.). Morgan Kaufmann Publishers Inc.
  34. DeepDB: Learn from Data, not from Queries! arXiv: Databases,arXiv: Databases (Sep 2019).
  35. Hopsworks. 2023. Hopsworks - Batch and Real-time ML Platform. https://www.hopsworks.ai
  36. JoinBoost: Grow Trees Over Normalized Data Using Only SQL. (Jul 2023).
  37. JS. 2023. Forex tick data huge database since april 2020 — Kaggle. https://www.kaggle.com/datasets/joseserrat/forex-april-2020-to-june-2021-tick-data
  38. Kaggle. 2023. Kaggle: Your Machine Learning and Data Science Community. https://www.kaggle.com/
  39. Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters. In Proceedings of the 2016 International Conference on Management of Data. https://doi.org/10.1145/2882903.2882940
  40. NoScope: Optimizing Neural Network Queries over Video at Scale. Proc. VLDB Endow. 10, 11 (aug 2017), 1586–1597. https://doi.org/10.14778/3137628.3137664
  41. Extending Relational Query Processing with ML Inference. Conference on Innovative Data Systems Research,Conference on Innovative Data Systems Research (Jan 2019).
  42. Kengle. 2023a. Predict Charge Time of Battery. https://www.kaggle.com/code/kenggle/nasa-battery-life-prediction-dataset-cleaning
  43. Kengle. 2023b. RUL Prediction of Turbofan Engine. https://www.kaggle.com/kenggle/rul-prediction-with-lgbm-on-turbofan-dataset
  44. Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference. Cornell University - arXiv,Cornell University - arXiv (Jun 2019).
  45. Everest: A Top-K Deep Video Analytics System. In Proceedings of the 2022 International Conference on Management of Data (New York, NY, USA, 2022-06-11) (SIGMOD ’22). Association for Computing Machinery, 2357–2360.
  46. S. H. Lee and W. Chen. 2009. A Comparative Study of Uncertainty Propagation Methods for Black-Box-Type Problems. Structural and Multidisciplinary Optimization 37, 3 (Jan. 2009), 239–253. https://doi.org/10.1007/s00158-008-0234-7
  47. Pretzel: opening the black box of machine learning prediction serving systems. Operating Systems Design and Implementation,Operating Systems Design and Implementation (Oct 2018).
  48. Wander Join: Online Aggregation via Random Walks. In Proceedings of the 2016 International Conference on Management of Data. https://doi.org/10.1145/2882903.2915235
  49. Kaiyu Li and Guoliang Li. 2018. Approximate Query Processing: What is New and Where to Go?: A Survey on Approximate Query Processing. Data Science and Engineering (Dec 2018), 379–397. https://doi.org/10.1007/s41019-018-0074-4
  50. Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.
  51. Qingzhi Ma and Peter Triantafillou. 2019. DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models. In Proceedings of the 2019 International Conference on Management of Data. https://doi.org/10.1145/3299869.3324958
  52. Christoph Molnar. 2023. Interpretable Machine Learning. Online. https://christophm.github.io/interpretable-ml-book/
  53. Barzan Mozafari and Ning Niu. 2015. A Handbook for Building an Approximate Query Engine. IEEE Data Eng. Bull. (2015).
  54. A tensor compiler for unified machine learning prediction serving. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 899–917.
  55. Nikhil Kohil. 2020. Stock Prediction using Linear Regression - Starter — Kaggle. https://www.kaggle.com/code/nikhilkohli/stock-prediction-using-linear-regression-starter
  56. Evaluating end-to-end optimization for data analytics applications in weld. Proceedings of the VLDB Endowment (May 2018), 1002–1015. https://doi.org/10.14778/3213880.3213890
  57. End-to-end Optimization of Machine Learning Prediction Queries. In Proceedings of the 2022 International Conference on Management of Data. https://doi.org/10.1145/3514221.3526141
  58. VerdictDB: Universalizing Approximate Query Processing. In Proceedings of the 2018 International Conference on Management of Data. https://doi.org/10.1145/3183713.3196905
  59. Database Learning: Toward a Database that Becomes Smarter Every Time. In Proceedings of the 2017 ACM International Conference on Management of Data. https://doi.org/10.1145/3035918.3064013
  60. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
  61. A. Pol and C. Jermaine. 2005. Relational confidence bounds are easy with the bootstrap. Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (2005). https://doi.org/10.1145/1066157.1066224
  62. Data Science through the looking glass and what we found there. arXiv: Learning,arXiv: Learning (Dec 2019).
  63. ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco California USA, 1135–1144. https://doi.org/10.1145/2939672.2939778
  64. Anchors: High-Precision Model-Agnostic Explanations. Proceedings of the AAAI Conference on Artificial Intelligence 32, 1 (April 2018). https://doi.org/10.1609/aaai.v32i1.11491
  65. Mathieu Rouaud. [n. d.]. Probability, Statistics and Estimation: Propagation of Uncertainties in Experimental Measurement. https://www.lulu.com/shop/mathieu-rouaud/probability-statistics-and-estimation-propagation-of-uncertainties-in-experimental-measurement/paperback/product-1kwvyjky.html.
  66. Issac Sacolick. 2020. Amazon, Google, and Microsoft take their clouds to the edge. (2020). https://www.infoworld.com/article/3575071/amazon-google-and-microsoft-take-their-clouds-to-the-edge.html
  67. B. Saha and K. Goebel. 2007. Battery Data Set. https://data.nasa.gov/dataset/Li-ion-Battery-Aging-Datasets/uj5r-zjdb/about_data
  68. Andrea Saltelli. 2002. Making Best Use of Model Evaluations to Compute Sensitivity Indices. Computer Physics Communications 145, 2 (May 2002), 280–297. https://doi.org/10.1016/S0010-4655(02)00280-1
  69. Containerized Execution of UDFs: An Experimental Evaluation. Proc. VLDB Endow. 15, 11 (jul 2022), 3158–3171. https://doi.org/10.14778/3551793.3551860
  70. Damage propagation modeling for aircraft engine run-to-failure simulation. In 2008 International Conference on Prognostics and Health Management. 1–9. https://doi.org/10.1109/PHM.2008.4711414
  71. Lloyd S. Shapley. 1952. A Value for N-Person Games. Technical Report. RAND Corporation.
  72. A Step Toward Deep Online Aggregation. Proceedings of the ACM on Management of Data 1, 2 (June 2023), 124:1–124:28. https://doi.org/10.1145/3589269
  73. Ilya M. Sobol. 1967. On the distribution of points in a cube and the approximate evaluation of integrals. Ussr Computational Mathematics and Mathematical Physics 7 (1967), 86–112. https://api.semanticscholar.org/CorpusID:122581245
  74. I. M Sobol′. 2001. Global Sensitivity Indices for Nonlinear Mathematical Models and Their Monte Carlo Estimates. Mathematics and Computers in Simulation 55, 1 (Feb. 2001), 271–280. https://doi.org/10.1016/S0378-4754(00)00270-6
  75. TalkingData. 2023. TalkingData AdTracking Fraud Detection Challenge — Kaggle. https://www.kaggle.com/competitions/talkingdata-adtracking-fraud-detection
  76. NYC Taxi and Limousine Commission. 2023. TLC Trip Record Data - TLC. https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
  77. Tecton. 2023. Real-Time Fraud Detection in the Databricks Lakehouse with Tecton. https://github.com/tecton-ai/blog-sample-code/blob/main/databricks/fraud-detection/Real-Time_Fraud_Detection_in_the_Databricks_Lakehouse_with_Tecton.ipynb
  78. Serkan Uysal. 2023. Machinery Fault Dataset. https://www.kaggle.com/datasets/uysalserkan/fault-induction-motor-dataset
  79. Vivek Khetan. 2016. A linear model on apps and labels. https://www.kaggle.com/code/vkhetan/a-linear-model-on-apps-and-labels
  80. Rafiki: machine learning as an analytics service system. Proceedings of the VLDB Endowment 12, 2 (Oct 2018), 128–140. https://doi.org/10.14778/3282495.3282499
  81. Feature Importance-aware Transferable Adversarial Attacks. https://doi.org/10.48550/arXiv.2107.14185 arXiv:2107.14185 [cs]
  82. Richard Wesley and Fei Xu. 2016. Incremental Computation of Common Windowed Holistic Aggregates. Proceedings of the VLDB Endowment 9, 12 (Aug. 2016), 1221–1232. https://doi.org/10.14778/2994509.2994537
  83. RALF: Accuracy-Aware Scheduling for Feature Store Maintenance. 17, 3 ([n. d.]), 563–576.
  84. Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures. Proc. VLDB Endow. 16 (2022), 406–419. https://api.semanticscholar.org/CorpusID:248665909
  85. Decoding Speculative Decoding. arXiv:2402.01528 [cs.LG]
  86. G-OLA: Generalized On-Line Aggregation for Interactive Analysis on Big Data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/2723372.2735381
  87. FEBench: A Benchmark for Real-Time Relational Data Feature Extraction. Proc. VLDB Endow. 16, 12 (aug 2023), 3597–3609. https://doi.org/10.14778/3611540.3611550

Summary

We haven't generated a summary for this paper yet.