Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast Approximations and Coresets for (k, l)-Median under Dynamic Time Warping (2312.09838v2)

Published 15 Dec 2023 in cs.CG

Abstract: We present algorithms for the computation of $\varepsilon$-coresets for $k$-median clustering of point sequences in $\mathbb{R}d$ under the $p$-dynamic time warping (DTW) distance. Coresets under DTW have not been investigated before, and the analysis is not directly accessible to existing methods as DTW is not a metric. The three main ingredients that allow our construction of coresets are the adaptation of the $\varepsilon$-coreset framework of sensitivity sampling, bounds on the VC dimension of approximations to the range spaces of balls under DTW, and new approximation algorithms for the $k$-median problem under DTW. We achieve our results by investigating approximations of DTW that provide a trade-off between the provided accuracy and amenability to known techniques. In particular, we observe that given $n$ curves under DTW, one can directly construct a metric that approximates DTW on this set, permitting the use of the wealth of results on metric spaces for clustering purposes. The resulting approximations are the first with polynomial running time and achieve a very similar approximation factor as state-of-the-art techniques. We apply our results to produce a practical algorithm approximating $(k,\ell)$-median clustering under DTW.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Cross-words reference template for dtw-based speech recognition systems. In TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, volume 4, pages 1576–1579 Vol.4, 2003.
  2. Clustering for metric and nonmetric distance measures. ACM Transactions on Algorithms, 6(4):59:1–59:26, 2010.
  3. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999. doi:10.1017/CBO9780511624216.
  4. Local Search Heuristics for k-Median and Facility Location Problems. SIAM Journal on Computing, 33(3):544–562, 2004.
  5. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929–965, oct 1989. URL: https://dl.acm.org/doi/10.1145/76359.76371, doi:10.1145/76359.76371.
  6. (k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time Warping. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems, volume 1, pages 99–110, New York, NY, USA, nov 2020. ACM. URL: https://dl.acm.org/doi/10.1145/3397536.3422245, arXiv:arXiv:2012.00464v1, doi:10.1145/3397536.3422245.
  7. The power of uniform sampling for coresets. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 462–473. IEEE, 2022.
  8. The power of uniform sampling for coresets. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 462–473. IEEE, 2022. doi:10.1109/FOCS54457.2022.00051.
  9. New Frameworks for Offline and Streaming Coreset Constructions. CoRR, abs/1612.00889, 2016. arXiv:1612.00889.
  10. Exact mean computation in dynamic time warping spaces. Data Min. Knowl. Discov., 33(1):252–291, 2019.
  11. Approximating (k, ℓℓ\ellroman_ℓ)-center clustering for curves. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 2922–2938, San Diego, California, USA, January 2019. SIAM.
  12. On the hardness of computing an average curve. In 17th Scandinavian Symposium and Workshops on Algorithm Theory, SWAT 2020, June 22-24, 2020, Tórshavn, Faroe Islands, pages 19:1–19:19, 2020.
  13. On the Hardness of Computing an Average Curve. In Susanne Albers, editor, 17th Scandinavian Symposium and Workshops on Algorithm Theory, volume 162 of LIPIcs, pages 19:1–19:19, Tórshavn, Faroe Islands, June 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik.
  14. klcluster: Center-based Clustering of Trajectories. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 496–499, 2019.
  15. Approximating (k,ℓ)𝑘ℓ(k,\ell)( italic_k , roman_ℓ )-median clustering for polygonal curves. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 2697–2717. SIAM, 2021.
  16. Approximating (k,ℓ)𝑘ℓ(k,\ell)( italic_k , roman_ℓ )-Median Clustering for Polygonal Curves. In Dániel Marx, editor, Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 2697–2717, Virtual Conference, January 2021. SIAM.
  17. Approximating (k,ℓ)𝑘ℓ(k,\ell)( italic_k , roman_ℓ )-median clustering for polygonal curves. ACM Trans. Algorithms, 19(1):4:1–4:32, 2023.
  18. Approximating length-restricted means under dynamic time warping. In Parinya Chalermsook and Bundit Laekhanukit, editors, Approximation and Online Algorithms - 20th International Workshop, WAOA 2022, Potsdam, Germany, September 8-9, 2022, Proceedings, volume 13538 of Lecture Notes in Computer Science, pages 225–253. Springer, 2022.
  19. Coresets for (k,ℓ)𝑘ℓ(k,\ell)( italic_k , roman_ℓ )-Median Clustering Under the Fréchet Distance. In Niranjan Balachandran and R. Inkulu, editors, Algorithms and Discrete Applied Mathematics - 8th International Conference, CALDAM, Puducherry, India, February 10-12, Proceedings, volume 13179 of Lecture Notes in Computer Science, pages 167–180. Springer, 2022.
  20. Tight hardness results for consensus problems on circular strings and time series. SIAM J. Discret. Math., 34(3):1854–1883, 2020.
  21. Ke Chen. On Coresets for k-Median and k-Means Clustering in Metric and Euclidean Spaces and Their Applications. SIAM Journal on Computing, 39(3):923–947, 2009.
  22. Curve simplification and clustering under fréchet distance. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 1414–1432. SIAM, 2023.
  23. Geometric median in nearly linear time. In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’16, page 9–21, New York, NY, USA, 2016. Association for Computing Machinery. doi:10.1145/2897518.2897647.
  24. Clustering time series under the Fréchet distance. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pages 766–785, 2016.
  25. A unified framework for approximating and clustering data. In Lance Fortnow and Salil P. Vadhan, editors, Proceedings of the 43rd ACM Symposium on Theory of Computing, pages 569–578. ACM, 2011.
  26. Turning Big Data Into Tiny Data: Constant-Size Coresets for k-Means, PCA, and Projective Clustering. SIAM Journal on Computing, 49(3):601–657, 2020.
  27. Relative (p,ε)\varepsilon)italic_ε )-Approximations in Geometry. Discrete & Computational Geometry, 45(3):462–496, April 2011.
  28. Time-series clustering by approximate prototypes. In 2008 19th International Conference on Pattern Recognition, pages 1–4, 2008.
  29. Piotr Indyk. Sublinear time algorithms for metric space problems. In Jeffrey Scott Vitter, Lawrence L. Larmore, and Frank Thomson Leighton, editors, Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, May 1-4, 1999, Atlanta, Georgia, USA, pages 428–434. ACM, 1999. doi:10.1145/301250.301366.
  30. Weighted dynamic time warping for time series classification. Pattern Recognit., 44(9):2231–2240, 2011.
  31. Rohit J. Kate. Using dynamic time warping distances as features for improved time series classification. Data Min. Knowl. Discov., 30(2):283–312, 2016.
  32. A Simple Linear Time (1+ε)1𝜀(1+\varepsilon)( 1 + italic_ε )-Approximation Algorithm for k-Means Clustering in Any Dimensions. In 45th Symposium on Foundations of Computer Science (FOCS), 17-19 October, Rome, Italy, Proceedings, pages 454–462. IEEE Computer Society, 2004.
  33. Universal ε𝜀\varepsilonitalic_ε-approximators for Integrals. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 598–607, 2010.
  34. Daniel Lemire. Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recognition, 42(9):2169 – 2180, 2009.
  35. On Coresets for Logistic Regression. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, NeurIPS, December 3-8, Montréal, Canada, pages 6562–6571, 2018.
  36. Dynamic time warping averaging of time series allows faster and more accurate classification. In Ravi Kumar, Hannu Toivonen, Jian Pei, Joshua Zhexue Huang, and Xindong Wu, editors, 2014 IEEE International Conference on Data Mining, ICDM 2014, Shenzhen, China, December 14-17, 2014, pages 470–479. IEEE Computer Society, 2014.
  37. Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm. Knowl. Inf. Syst., 47(1):1–26, 2016.
  38. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit., 44(3):678–693, 2011.
  39. Considerations in applying clustering techniques to speaker independent word recognition. In ICASSP ’79. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 4, pages 578–581, 1979.
  40. Addressing big data time series: Mining trillions of time series subsequences under dynamic time warping. ACM Trans. Knowl. Discov. Data, 7(3):10:1–10:31, 2013.
  41. Norbert Sauer. On the density of families of sets. Journal of Combinatorial Theory Series A, 13:145–147, 1972.
  42. Faster binary mean computation under dynamic time warping. In 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, June 17-19, 2020, Copenhagen, Denmark, pages 28:1–28:13, 2020.
  43. Saharon Shelah. A combinatorial problem; stability and order for models and theories in infinitary languages. Pacific Journal of Mathematics, 41(1), 1972.
  44. A novel non-parametric method for time series classification based on k-nearest neighbors and dynamic time warping barycenter averaging. Eng. Appl. Artif. Intell., 78:173–185, 2019.
  45. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:264–280, 1971.
Citations (1)

Summary

We haven't generated a summary for this paper yet.