Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Calculating the matrix profile from noisy data (2306.10151v1)

Published 16 Jun 2023 in cs.LG, cs.MS, cs.NA, and math.NA

Abstract: The matrix profile (MP) is a data structure computed from a time series which encodes the data required to locate motifs and discords, corresponding to recurring patterns and outliers respectively. When the time series contains noisy data then the conventional approach is to pre-filter it in order to remove noise but this cannot apply in unsupervised settings where patterns and outliers are not annotated. The resilience of the algorithm used to generate the MP when faced with noisy data remains unknown. We measure the similarities between the MP from original time series data with MPs generated from the same data with noisy data added under a range of parameter settings including adding duplicates and adding irrelevant data. We use three real world data sets drawn from diverse domains for these experiments Based on dissimilarities between the MPs, our results suggest that MP generation is resilient to a small amount of noise being introduced into the data but as the amount of noise increases this resilience disappears

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM International Conference on Data Mining. SIAM; 2009. p. 473–484.
  2. Son NT, Anh DT. Discovering Time Series Motifs Based on Multidimensional Index and Early Abandoning. In: Nguyen NT, Hoang K, Jedrzejowicz P, editors. Computational Collective Intelligence. Technologies and Applications. Berlin, Heidelberg: Springer Berlin Heidelberg; 2012. p. 72–82.
  3. Time Series Discord Discovery on Intel Many-Core Systems. In: Sokolinsky L, Zymbler M, editors. Parallel Computational Technologies. Cham: Springer International Publishing; 2019. p. 168–182.
  4. Matrix profile I: All pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE; 2016. p. 1317–1322.
  5. Efficient similarity joins on massive high-dimensional datasets using MapReduce. In: 2012 IEEE 13th International Conference on Mobile Data Management. IEEE; 2012. p. 1–10.
  6. Efficient Similarity Joins for Near-Duplicate Detection. ACM Trans Database Syst. 2011;36(3). doi:10.1145/2000824.2000825.
  7. Matrix profile II: Exploiting a novel algorithm and GPUs to break the one hundred million barrier for time series motifs and joins. In: 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE; 2016. p. 739–748.
  8. Matrix Profile IX: Admissible Time Series Motif Discovery With Missing Data. IEEE Transactions on Knowledge and Data Engineering. 2021;33(6):2616–2626. doi:10.1109/TKDE.2019.2950623.
  9. Matrix profile XI: SCRIMP++: Time series motif discovery at interactive speeds. In: 2018 IEEE International Conference on Data Mining (ICDM). IEEE; 2018. p. 837–846.
  10. Time series joins, motifs, discords and shapelets: A unifying view that exploits the matrix profile. Data Mining and Knowledge Discovery. 2018;32(1):83–123.
  11. The Swiss Army Knife of Time Series Data Mining: Ten useful things you can do with the matrix profile. Data Mining and Knowledge Discovery. 2020;34(4):949–979.
  12. Temporary rules of retail product sales time series based on the matrix profile. Journal of Retailing and Consumer Services. 2021;60:102431.
  13. Anomaly Detection on IT Operation Series via Online Matrix Profile. arXiv preprint arXiv:210812093. 2021;.
  14. Financial Time Series: Market Analysis Techniques Based on Matrix Profiles. Engineering Proceedings. 2021;5(1):45.
  15. A novel matrix profile-guided attention LSTM model for forecasting COVID-19 cases in USA. Frontiers in Public Health. 2021;9.
  16. Dau HA, Keogh E. Matrix profile V: A generic technique to incorporate domain knowledge into motif discovery. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2017. p. 125–134.
  17. Ruf T. The Lomb-Scargle Periodogram in Biological Rhythm Research: Analysis of Incomplete and Unequally Spaced Time-Series. Biological Rhythm Research. 1999;30(2):178–201. doi:10.1076/brhm.30.2.178.1422.
  18. Recovering Missing Data via Top-k Repeated Patterns for Fuzzy-Based Abnormal Node Detection in Sensor Networks. IEEE Access. 2022;10:61046–61064. doi:10.1109/ACCESS.2022.3181742.
  19. Implications of Z-Normalization in the Matrix Profile. In: De Marsico M, Sanniti di Baja G, Fred A, editors. Pattern Recognition Applications and Methods. Cham: Springer International Publishing; 2020. p. 95–118.
  20. Eliminating Noise in the Matrix Profile. In: Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods - ICPRAM,. INSTICC. SciTePress; 2019. p. 83–93.
  21. Lavin A, Ahmad S. Evaluating Real-Time Anomaly Detection Algorithms – The Numenta Anomaly Benchmark. In: 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA); 2015. p. 38–44.
  22. ADBench: Anomaly Detection Benchmark. In: Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track; 2022.Available from: https://openreview.net/forum?id=foA_SFQ9zo0.
  23. Lifelogging: Personal big data. Foundations and Trends in Information Retrieval. 2014;8(1):1–125.
  24. Tuovinen L, Smeaton AF. Privacy-aware sharing and collaborative analysis of personal wellness data: Process model, domain ontology, software system and user trial. PloS ONE. 2022;17(4):e0265997.
  25. Smeaton AF. Lifelogging as a Memory Prosthetic. In: Proceedings of the 4th Annual on Lifelog Search Challenge. LSC ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 1. Available from: https://doi.org/10.1145/3463948.3469271.
  26. Keystroke dynamics as part of lifelogging. In: International Conference on Multimedia Modelling. Springer; 2021. p. 183–195.
  27. Joyce R, Gupta G. Identity authentication based on keystroke latencies. Communications of the ACM. 1990;33(2):168–176.
  28. Leijten M, Van Waes L. Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication. 2013;30(3):358–392.
  29. Loggerman, a comprehensive logging and visualization tool to capture computer usage. In: International Conference on Multimedia Modeling. Springer; 2016. p. 342–347.
  30. Smeaton A. Keystroke timing information for 2,522,186 characters typed over several months; 2020. Available from: https://figshare.com/articles/dataset/Keystroke_timing_information_for_2_522_186_characters_typed_over_several_months/13157510.
  31. Teagasc.
  32. Sensor based time budgets in commercial Dutch dairy herds vary over lactation cycles and within 24 hours. PLoS ONE. 2022;17(2):e0264392.
  33. Predicting livestock behaviour using accelerometers: A systematic review of processing techniques for ruminant behaviour prediction from raw accelerometer data. Computers and Electronics in Agriculture. 2022;192:106610.
  34. Large scale population assessment of physical activity using wrist worn accelerometers: the UK Biobank study. PloS ONE. 2017;12(2):e0169649.
  35. Available from: https://figshare.com/articles/dataset/Raw_accelerometer_data_from_neck-worn_accelerometers_for_7_new-born_calves/13621985.
  36. McCann B. A review of SCATS operation and deployment in Dublin. In: Proceedings of the 19th JCT Traffic Signal Symposium & Exhibition; 2014.
  37. Intelligent synthesis and real-time response using massive streaming of heterogeneous data (INSIGHT) and its anticipated effect on intelligent transport systems (ITS) in Dublin City, Ireland. Proceedings of the 10th Intelligent Transport Systems (ITS) European Congress, Helsinki. 2014;.
  38. Anscombe FJ. Graphs in statistical analysis. The American Statistician. 1973;27(1):17–21.
  39. Vaughan N, Gabrys B. Comparing and combining time series trajectories using dynamic time warping. Procedia Computer Science. 2016;96:465–474.
  40. Salvador S, Chan P. Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis. 2007;11(5):561–580.
  41. Wu R, Keogh EJ. FastDTW is approximate and Generally Slower than the Algorithm it Approximates. IEEE Transactions on Knowledge and Data Engineering. 2020;34(8):3779–3785.
Citations (2)

Summary

We haven't generated a summary for this paper yet.