Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Monitoring Machine Learning Models: Online Detection of Relevant Deviations (2309.15187v1)

Published 26 Sep 2023 in cs.LG, stat.AP, and stat.ML

Abstract: Machine learning models are essential tools in various domains, but their performance can degrade over time due to changes in data distribution or other factors. On one hand, detecting and addressing such degradations is crucial for maintaining the models' reliability. On the other hand, given enough data, any arbitrary small change of quality can be detected. As interventions, such as model re-training or replacement, can be expensive, we argue that they should only be carried out when changes exceed a given threshold. We propose a sequential monitoring scheme to detect these relevant changes. The proposed method reduces unnecessary alerts and overcomes the multiple testing problem by accounting for temporal dependence of the measured model quality. Conditions for consistency and specified asymptotic levels are provided. Empirical validation using simulated and real data demonstrates the superiority of our approach in detecting relevant changes in model quality compared to benchmark methods. Our research contributes a practical solution for distinguishing between minor fluctuations and meaningful degradations in machine learning model performance, ensuring their reliability in dynamic environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Towards a human-like open-domain chatbot. arXiv preprint arXiv:2001.09977.
  2. Sequential change-point detection in GARCH (p, q) models. Econometric Theory, 20(6):1140–1167.
  3. Language models are few-shot learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  4. Are deviations in a gradually varying mean relevant? a testing approach based on sup-norm estimators. The Annals of Statistics, 49(6):3583–3617.
  5. Chen, H. (2019). Sequential change-point detection based on nearest neighbors. The Annals of Statistics, 47(3):1381–1407.
  6. Monitoring structural change. Econometrica: Journal of the Econometric Society, pages 1045–1065.
  7. Testing relevant hypotheses in functional time series via self-normalization. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(3):629–660.
  8. Detecting relevant changes in the mean of nonstationary processes—a mass excess approach. The Annals of Statistics, 47(6):3578–3608.
  9. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186.
  10. Local Polynomial Modelling and its Applications. Routledge, 1 edition.
  11. Fremdt, S. (2015). Page’s sequential procedure for change-point detection in time series regression. Statistics, 49(1):128–155.
  12. A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4):1–37.
  13. Sequential change point detection in high dimensional time series. Electronic Journal of Statistics, 16(1):3608–3671.
  14. AI lifecycle models need to be revised: an exploratory study in fintech. Empirical Software Engineering, 26:1–29.
  15. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778.
  16. A distribution free test for changes in the trend function of locally stationary processes. Electronic Journal of Statistics, 15(2):3762–3797.
  17. Long short-term memory. Neural Computation, 9(8):1735–1780.
  18. Sequential change point tests based on U-statistics. Scandinavian Journal of Statistics, 49(3):1184–1214.
  19. Monitoring and explainability of models in production. arXiv preprint arXiv:2007.06299.
  20. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90.
  21. Lai, T. L. (1995). Sequential changepoint detection in quality control and dynamical systems. Journal of the Royal Statistical Society Series B: Statistical Methodology, 57(4):613–644.
  22. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551.
  23. Learning under concept drift: a review. IEEE Transactions on Knowledge and Data Engineering, 31(12):2346–2363.
  24. Improving language understanding by generative pre-training.
  25. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  26. Towards automated ML model monitoring: measure, improve and quantify data quality.
  27. Monitoring machine learning models: a categorization of challenges and methods. Data Science and Management, 5(3):105–116.
  28. Improvement of kernel type density estimators. Journal of the American Statistical Association, 72(358):420–423.
  29. Hidden technical debt in machine learning systems. Advances in Neural Information Processing Systems, 28.
  30. A human-centric perspective on model monitoring. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 10, pages 173–183.
  31. Storkey, A. (2009). When training and test sets are different: characterizing learning transfer. Dataset Shift in Machine Learning, 30:3–28.
  32. MLOps: a taxonomy and a methodology. IEEE Access, 10:63606–63618.
  33. Temporal quality degradation in AI models. Scientific Reports, 12(1):11654.
  34. Wu, W. B. (2005). Nonlinear system theory: another look at dependence. Proceedings of the National Academy of Sciences, 102(40):14150–14154.
  35. Inference of trends in time series. Journal of the Royal Statistical Society Series B: Statistical Methodology, 69(3):391–410.
Citations (2)

Summary

We haven't generated a summary for this paper yet.