Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A comprehensive analysis of concept drift locality in data streams (2311.06396v2)

Published 10 Nov 2023 in cs.LG

Abstract: Adapting to drifting data streams is a significant challenge in online learning. Concept drift must be detected for effective model adaptation to evolving data properties. Concept drift can impact the data distribution entirely or partially, which makes it difficult for drift detectors to accurately identify the concept drift. Despite the numerous concept drift detectors in the literature, standardized procedures and benchmarks for comprehensive evaluation considering the locality of the drift are lacking. We present a novel categorization of concept drift based on its locality and scale. A systematic approach leads to a set of 2,760 benchmark problems, reflecting various difficulty levels following our proposed categorization. We conduct a comparative assessment of 9 state-of-the-art drift detectors across diverse difficulties, highlighting their strengths and weaknesses for future research. We examine how drift locality influences the classifier performance and propose strategies for different drift categories to minimize the recovery time. Lastly, we provide lessons learned and recommendations for future concept drift research. Our benchmark data streams and experiments are publicly available at https://github.com/gabrieljaguiar/locality-concept-drift.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Learning with drift detection, in: Brazilian Symposium on Artificial Intelligence, 2004.
  2. Data stream analysis: Foundations, major tasks and tools, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (2021).
  3. A survey on concept drift adaptation, ACM Computing Surveys (2014).
  4. A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework, Machine Learning (2023).
  5. Ł. Korycki, B. Krawczyk, Concept drift detection from multi-class imbalanced data streams, in: IEEE International Conference on Data Engineering (ICDE), 2021.
  6. A case study of batch and incremental recommender systems in supermarket data under concept drifts and cold start, Expert Systems with Applications (2021).
  7. A survey on machine learning for recurring concept drifting data streams, Expert Systems with Applications (2023).
  8. A concept drift-tolerant case-base editing technique, Artificial Intelligence (2016).
  9. Regional concept drift detection and density synchronized drift adaptation, in: International Joint Conference on Artificial Intelligence (IJCAI), 2017.
  10. Rddm: Reactive drift detection method, Expert Systems with Applications (2017).
  11. E. B. Gulcan, F. Can, Unsupervised concept drift detection for multi-label data streams, Artificial Intelligence Review (2023).
  12. J. Gama, G. Castillo, Learning with local drift detection, in: International conference on advanced data mining and applications, Springer, 2006.
  13. Ensemble learning for data stream analysis: A survey, Information Fusion (2017).
  14. Learning under concept drift: A review, IEEE Transactions on Knowledge and Data Engineering (2018).
  15. Analyzing concept drift: A case study in the financial sector, Intelligent Data Analysis (2020).
  16. Characterizing concept drift, Data Mining and Knowledge Discovery (2016).
  17. B. Krawczyk, A. Cano, Online ensemble learning with abstaining classifiers for drifting and noisy data streams, Applied Soft Computing (2018).
  18. Learning in nonstationary environments: A survey, IEEE Computational Intelligence Magazine (2015).
  19. R. S. M. Barros, S. G. T. C. Santos, A large-scale comparison of concept drift detectors, Information Sciences (2018).
  20. E. S. Page, Continuous inspection schemes, Biometrika (1954).
  21. S. Roberts, Control chart tests based on geometric moving averages, Technometrics (2000).
  22. Early drift detection method, in: International Workshop on Knowledge Discovery from Data Streams, 2006.
  23. Online and non-parametric drift detection methods based on hoeffding’s bounds, IEEE Transactions on Knowledge and Data Engineering (2014).
  24. Wilcoxon rank sum test drift detector, Neurocomputing (2018).
  25. Exponentially weighted moving average charts for detecting concept drift, Pattern Recognition Letters (2012).
  26. A. Bifet, R. Gavalda, Learning from time-changing data with adaptive windowing, in: SIAM International Conference on Data Mining, 2007.
  27. Reactive soft prototype computing for concept drift streams, Neurocomputing (2020).
  28. K. Nishida, K. Yamauchi, Detecting concept drift using statistical testing, in: International Conference on Discovery Science, Springer, 2007.
  29. Detecting volatility shift in data streams, in: IEEE International Conference on Data Mining, 2014.
  30. A. Pesaranghader, H. Viktor, Fast hoeffding drift detection method for evolving data streams, in: Machine Learning and Knowledge Discovery in Databases: European Conference, 2016.
  31. J. Komorniczak, P. Ksieniewicz, Complexity-based drift detection for nonstationary data streams, Neurocomputing (2023).
  32. Quadcdd: A quadruple-based approach for understanding concept drift in data streams, Expert Systems with Applications (2024).
  33. An empirical insight into concept drift detectors ensemble strategies, in: IEEE Congress on Evolutionary Computation, 2018.
  34. P. Sobolewski, M. Woźniak, Comparable study of statistical tests for virtual concept drift detection, in: International Conference on Computer Recognition Systems, 2013.
  35. Statistical change detection for multi-dimensional data, in: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007.
  36. A pca-based change detection framework for multidimensional data streams: Change detection in multidimensional data streams, in: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015.
  37. Concept drift detection based on equal density estimation, in: International Joint Conference on Neural Networks (IJCNN), IEEE, 2016.
  38. A pdf-free change detection test based on density difference estimation, IEEE Transactions on Neural Networks and Learning Systems (2016).
  39. Accumulating regional density dissimilarity for concept drift detection in data streams, Pattern Recognition (2018).
  40. Concept drift detection via competence models, Artificial Intelligence (2014).
  41. A comparative study on concept drift detectors, Expert Systems with Applications (2014).
  42. A differential evolution based method for tuning concept drift detectors in data streams, Information Sciences (2019).
  43. Novel hybrid pair recommendations based on a large-scale comparative study of concept drift detection, Expert Systems with Applications (2021).
  44. Are concept drift detectors reliable alarming systems?-a comparative study, in: IEEE International Conference on Big Data, IEEE, 2022.
  45. Benchmarking change detector algorithms from different concept drift perspectives, Future Internet (2023).
  46. The impact of data difficulty factors on classification of imbalanced and concept drifting data streams, Knowledge and Information Systems (2021).
  47. M. Lango, J. Stefanowski, What makes multi-class imbalanced problems difficult? an experimental study, Expert Systems with Applications (2022).
  48. Stress-testing hoeffding trees, in: Knowledge Discovery in Databases: PKDD 2005: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Springer, 2005.
  49. River: machine learning for streaming data in python, The Journal of Machine Learning Research (2021).
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com