2000 character limit reached
A Historical Context for Data Streams (2310.19811v1)
Published 18 Oct 2023 in cs.LG, cs.DB, cs.SY, and eess.SY
Abstract: Machine learning from data streams is an active and growing research area. Research on learning from streaming data typically makes strict assumptions linked to computational resource constraints, including requirements for stream mining algorithms to inspect each instance not more than once and be ready to give a prediction at any time. Here we review the historical context of data streams research placing the common assumptions used in machine learning over data streams in their historical context.
- A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. Machine Learning, page in press, 2023.
- The space complexity of approximating the frequency moments. In Proceedings of the 28th annual ACM symposium on Theory of computing, pages 20–29, 1996.
- Models and issues in data stream systems. In Proc. of the 21st ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 1–16, 2002.
- J. Bartel. Non-preemptive multitasking. The Computer Journal, 30:37–39, 1988.
- MOA: Massive online analysis, a framework for stream classification and clustering. In Proceedings of the 1st workshop on applications of pattern analysis, pages 44–50, 2010.
- Machine Learning for Data Streams with Practical Examples in MOA. MIT Press, 2018.
- W. H. Burge. Stream processing functions. IBM Journal of Research and Development, 19(1):12–25, 1975.
- Runtime adaptation of data stream processing systems: The state of the art. ACM Computing Surveys, 54(11s), 2022.
- Niagaracq: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 379–390, 2000.
- E. Codd. A relational model of data for large shared data banks. Communications of the ACM, 13(6):377–387, 1970.
- M. E. Conway. Design of a separable transition-diagram compiler. Communications of the ACM, 6(7):396–408, 1963.
- J. Cunningham. The problem of specification of a common business language for auto-matic digital computers, 1959. Report for CODASYL Steering Committee.
- L. Daston. Rules: A Short History of What We Live By. Princeton University Press, 2022.
- P. Domingos and G. Hulten. A general framework for mining massive data streams. Journal of Computational and Graphical Statistics, 12:945–949, 2003.
- A report on the SISAL language project. Journal of Parallel and Distributed Computing, 10(4):349–366, 1990.
- Ph. Flajolet. Approximate counting: a detailed analysis. BIT Numerical Mathematics, 25(1):113–134, 1985.
- J. Gama. Knowledge Discovery from Data Streams. Chapman & Hall, 2010.
- Machine learning for streaming data: State of the art, challenges, and opportunities. SIGKDD Explorations Newsletter, 21(2):6–22, 2019.
- Weka: A machine learning workbench. In Proceedings of ANZIIS’94-Australian New Zealnd Intelligent Information Systems Conference, pages 357–361, 1994.
- Mining time-changing data streams. In Proceedings of the 7th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, pages 97–106, 2001.
- A block diagram compiler. The Bell System Technical Journal, 40(3):669–678, 1961.
- Big data stream analysis: a systematic literature review. Journal of Big Data, 6(47), 2019.
- Doug et al. Laney. 3D data management: Controlling data volume, velocity and variety. META group research note, 6(70):1, 2001.
- J. P. Morrison. Data responsive modular interleaved task programming system. An IP.com Prior Art Database Technical Disclosure.
- J. P. Morrison. Data stream linkage mechanism. IBM Systems Journal, 17(4):383–408, 1978.
- J. Munro and M. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, 12(3):315–323, 1980.
- S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2):117–236, 2005.
- J. Read and I. Zliobaite. Learning from data streams: An overview and update. arXiv, 2212.14720(cs.LG), 2023.
- F. Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957.
- D. Sepkoski. The database before the computer? Osiris, 32(1):175–201, 2017.
- W. Sutherland. The on-line graphical specification of computer procedures. PhD thesis, MIT, 1966.
- V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.
- G. Widmer and M. Kubat. Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1):69–101, 1996.