Optimizing sDTW for AMD GPUs (2403.06931v1)
Abstract: Subsequence Dynamic Time Warping (sDTW) is the metric of choice when performing many sequence matching and alignment tasks. While sDTW is flexible and accurate, it is neither simple nor fast to compute; significant research effort has been spent devising parallel implementations on the GPU that leverage efficient memory access and computation patterns, as well as features offered by specific vendors and architectures (notably NVIDIA's). We present an implementation of sDTW on AMD hardware using HIP and ROCm. Our implementation employs well-known parallel patterns, as well as lower-level features offered by ROCm. We use shuffling for intra-wavefront communication and shared memory to transfer data between consecutive wavefronts. By constraining the input data to batches of 512 queries of length 2,000, we optimized for peak performance the width of reference elements operated on by a single thread.
- C. Myers. A comparative study of severa l dynamic time-warping algorithms for speech recognition. 1980.
- sdtw: computing dtw distances using locally relevant constraints based on salient feature alignments. Proc. VLDB Endow., 5(11):1519–1530, jul 2012. ISSN 2150-8097. doi: 10.14778/2350229.2350266. URL https://doi.org/10.14778/2350229.2350266.
- Making time-series classification more accurate using learned constraints. pages 11–22, 2004. doi: 10.1137/1.9781611972740.2. URL https://epubs.siam.org/doi/abs/10.1137/1.9781611972740.2.
- Dynamic Time Warping as an Alternative to Windowed Cross Correlation in Seismological Applications. Seismological Research Letters, 93(3):1909–1921, 03 2022. ISSN 0895-0695. doi: 10.1785/0220210288. URL https://doi.org/10.1785/0220210288.
- Dynamic time warping in classification and selection of motion capture data. Multidim Syst Sign Process, 30:1437–1468, 2019.
- A novel network traffic anomaly detection approach using the optimal φ𝜑\varphiitalic_φ-dtw. In 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), pages 1–4, 2020. doi: 10.1109/ICSESS49938.2020.9237659.
- Cuda-accelerated alignment of subsequences in streamed time series data. In 2014 43rd International Conference on Parallel Processing, pages 10–19, 2014. doi: 10.1109/ICPP.2014.10.
- cudtw++: Ultra-fast dynamic time warping on cuda-enabled gpus. European Conference on Parallel Processing, 2020. URL https://link.springer.com/chapter/10.1007/978-3-030-57675-2_37.
- Harisankar Sadasivan and Daniel et al. Stiffler. Accelerated dynamic time warping on gpu for selective nanopore sequencing. bioRxiv, 2023. URL https://doi.org/10.1101/2023.03.05.531225.
- The era of 1-bit llms: All large language models are in 1.58 bits. PrePrint, 2024.