AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping (2403.06478v1)
Abstract: With the advance in genome sequencing technology, the lengths of deoxyribonucleic acid (DNA) sequencing results are rapidly increasing at lower prices than ever. However, the longer lengths come at the cost of a heavy computational burden on aligning them. For example, aligning sequences to a human reference genome can take tens or even hundreds of hours. The current de facto standard approach for alignment is based on the guided dynamic programming method. Although this takes a long time and could potentially benefit from high-throughput graphic processing units (GPUs), the existing GPU-accelerated approaches often compromise the algorithm's structure, due to the GPU-unfriendly nature of the computational pattern. Unfortunately, such compromise in the algorithm is not tolerable in the field, because sequence alignment is a part of complicated bioinformatics analysis pipelines. In such circumstances, we propose AGAThA, an exact and efficient GPU-based acceleration of guided sequence alignment. We diagnose and address the problems of the algorithm being unfriendly to GPUs, which comprises strided/redundant memory accesses and workload imbalances that are difficult to predict. According to the experiments on modern GPUs, AGAThA achieves 18.8$\times$ speedup against the CPU-based baseline, 9.6$\times$ against the best GPU-based baseline, and 3.6$\times$ against GPU-based algorithms with different heuristics.
- GASAL2: A GPU accelerated sequence alignment library for high-throughput NGS data. BMC bioinformatics 20 (2019), 1–20.
- From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures. Computational and Structural Biotechnology Journal 20 (2022), 4579–4599.
- Technology dictates algorithms: recent developments in read alignment. Genome biology 22, 1 (2021), 249.
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research 25, 17 (1997), 3389–3402.
- ADEPT: A domain independent sequence alignment strategy for GPU architectures. BMC bioinformatics 21, 1 (2020), 1–29.
- Protein alignment algorithms with an efficient backtracking routine on multiple GPUs. BMC bioinformatics 12, 1 (2011), 1–17.
- Genomic interventions for sustainable agriculture. Plant Biotechnology Journal 18, 12 (2020), 2388–2405.
- Dynamic task parallelism with a GPU work-stealing runtime system. In LCPC Workshop. Springer, Fort Collins, CO, USA, 203–217.
- Jack Choquette. 2023. NVIDIA Hopper H100 GPU: Scaling Performance. IEEE Micro 43 (2023), 9–17.
- CUDAlign 4.0: Incremental speculative traceback for exact chromosome-wide alignment in GPU clusters. IEEE Transactions on Parallel and Distributed Systems 27, 10 (2016), 2838–2850.
- CUDAlign 3.0: Parallel biological sequence comparison in large GPU clusters. In CCGrid. IEEE, Chicago, IL, USA, 160–169.
- Accelerating long read alignment on three processors. In ICPP. ACM, Kyoto, Japan, 1–10.
- SeedEx: A Genome Sequencing Accelerator for Optimal Alignments in Subminimal Space. In MICRO. IEEE, Athens, Greece, 937–950.
- GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis. BMC bioinformatics 21 (2020), 1–13.
- Long walk to genomics: History and current approaches to genome sequencing and assembly. Computational and Structural Biotechnology Journal 18 (2020), 9–19.
- A study of persistent threads style GPU programming for GPGPU workloads. In InPar. IEEE, San Jose, CA, USA, 1–14.
- Accelerating CUDA graph algorithms at maximum warp. In PPoPP. ACM, San Antonio, TX, USA, 267–276.
- Accelerating minimap2 for long-read sequencing applications on modern CPUs. Nature Computational Science 2, 2 (2022), 78–83.
- Efficient warp execution in presence of divergence with collaborative context collection. In MICRO. ACM, Waikiki, HI, USA, 204–215.
- Eliminating intra-warp load imbalance in irregular nested patterns via collaborative task engagement. In IPDPS. IEEE, Chicago, IL, USA, 524–533.
- Matija Korpar and Mile Šikić. 2013. SW#–GPU-enabled exact alignments on genome scale. Bioinformatics 29, 19 (2013), 2494–2495.
- Heng Li. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio.GN]
- Heng Li. 2018. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34, 18 (2018), 3094–3100.
- SOAP3: Ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28, 6 (2012), 878–879.
- Bio-sequence database scanning on a GPU. In IPDPS. IEEE, Rhodes Island, Greece, 1–8.
- GPU accelerated smith-waterman. In ICCS. Springer, Reading, UK, 188–195.
- CUDASW++: Optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units. BMC research notes 2, 1 (2009), 1–10.
- High-speed and accurate color-space short-read alignment with CUSHAW2. arXiv:1304.4766 [q-bio.GN]
- CUSHAW3: Sensitive and accurate base-space and color-space short-read alignment with hybrid seeding. PloS one 9, 1 (2014), e86869.
- Yongchao Liu and Bertil Schmidt. 2015. GSWABE: Faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences. Concurrency and Computation: Practice and Experience 27, 4 (2015), 958–972.
- CUDASW++ 2.0: Enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions. BMC research notes 3, 1 (2010), 1–12.
- CUSHAW: A CUDA compatible short read aligner to large genomes based on the Burrows–Wheeler transform. Bioinformatics 28, 14 (2012), 1830–1837.
- CUDASW++ 3.0: Accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC bioinformatics 14, 1 (2013), 1–10.
- Long-read sequencing emerging in medical genetics. Frontiers in genetics 10 (2019), 426.
- NCBI. 1982. GenBank. https://www.ncbi.nlm.nih.gov/genbank/, visited 2024-01-15.
- Saul B Needleman and Christian D Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology 48, 3 (1970), 443–453.
- NIST. 2012. Genome in a Bottle. https://www.nist.gov/programs-projects/genome-bottle, visited 2024-01-16.
- NVIDIA. 2022a. Boosting Dynamic Programming Performance Using NVIDIA Hopper GPU DPX Instructions. https://developer.nvidia.com/blog/boosting-dynamic-programming-performance-using-nvidia-hopper-gpu-dpx-instructions/, visited 2024-01-16.
- NVIDIA. 2022b. NVIDIA Hopper GPU Architecture Accelerates Dynamic Programming Up to 40x Using New DPX Instructions. https://blogs.nvidia.com/blog/2022/03/22/nvidia-hopper-accelerates-dynamic-programming-using-dpx-instructions/, visited 2024-01-15.
- PacBio. 2023. PacBio - Sequence with Confidence. https://www.pacb.com/, visited 2023-04-21.
- Seongyeon Park. 2024. readwrite112/AGAThA: AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping. AISys. https://doi.org/10.5281/zenodo.10462237
- SALoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs. In IPDPS. IEEE, Lyon, France, 728–738.
- Julian Parkhill and Brendan W Wren. 2011. Bacterial epidemiology and biology-lessons from genome sequencing. Genome biology 12 (2011), 1–7.
- NCBI RefSeq. 2019. GRCh38.p13. https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000001405.39/, visited 2024-01-16.
- Accelerating Minimap2 for accurate long read alignment on GPUs. Journal of biotechnology and biomedicine 6, 1 (2023), 13–23.
- Edans Flavius de O Sandes and Alba Cristina MA de Melo. 2011. Smith-Waterman alignment of huge sequences with GPU in linear space. In IPDPS. IEEE, Anchorage, AK, USA, 1199–1211.
- Edans Flavius O Sandes and Alba Cristina MA de Melo. 2010. CUDAlign: Using GPU to accelerate the comparison of megabase genomic sequences. In PPoPP. ACM, New York, NY, USA, 137–146.
- Identification of common molecular subsequences. Journal of molecular biology 147, 1 (1981), 195–197.
- Softshell: dynamic scheduling on gpus. ACM Transactions on Graphics 31, 6 (2012), 1–11.
- Whippletree: Task-based scheduling of dynamic workloads on the GPU. ACM Transactions on Graphics 33, 6 (2014), 1–11.
- Yutaka Suzuki. 2020. Advent of a new sequencing era: long-read and on-site sequencing. Journal of Human Genetics 65, 1 (2020), 1–1.
- Oxford Nanopore Technologies. 2018. PromethION. https://nanoporetech.com/products/promethion, visited 2024-01-16.
- Task management for irregular-parallel workloads on the GPU. In HPG. ACM, Saarbrucken, Germany, 29–37.
- smsMap: mapping single molecule sequencing reads by locating the alignment starting positions. BMC bioinformatics 21, 1 (2020), 1–15.
- Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature biotechnology 37, 10 (2019), 1155–1162.
- Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations. In ICS. ACM, Newport Beach, CA, USA, 119–130.
- LOGAN: High-performance GPU-based x-drop long-read alignment. In IPDPS. IEEE, New Orleans, LA, USA, 462–471.
- Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping. In ICS. ACM, Tsukuba, Ibaraki, Japan, 115–126.
- Versapipe: a versatile programming framework for pipelined computing on GPU. In MICRO. ACM, Cambridge, MA, USA, 587–599.