Long read sequencing technology is becoming increasingly popular for Precision Medicine applications like variant calling from Whole Genome Sequencing (WGS) and for metagenomics applications like microbial abundance estimation. Minimap2 is the state-of-the-art aligner and mapper used by the leading long read sequencing technologies, today. However, Minimap2 is very slow for long noisy reads. ∼60-70% of the run-time on a CPU comes from the highly sequential chaining step in Minimap2. On the other hand, most Point-of-Care computational workflows in long read sequencing use Graphics Processing Units (GPUs). We present minimap2-accelerated ( mm2-ax ), a heterogeneous design for sequence mapping and alignment where the compute intensive chaining step of minimap2 is sped up on the GPU and demonstrate its time and cost benefits.
We extract better intra-read parallelism from chaining without loosing mapping accuracy by forward transforming Minimap2’s chaining algorithm . Further, we utilize the high memory available on modern cloud instances for better performance on the GPU by converting a sparse vector which defines the chaining workload to a dense one in order to optimize for better arithmetic intensity (more operations per byte of data fetched from high-latency global memory) on the GPU. We also optimize for better workload balancing, data locality and minimal branch divergence on the GPU. We show mm2-ax on an NVIDIA A100 GPU improves the chaining step with 12.6 - 5X speedup and 9.44 - 3.77X speedup: costup over the fastest version of Minimap2, mm2-fast , benchmarked on a single Google Cloud Platform instance of 30 SIMD cores.
mm2-ax is minimap2 sped-up on GPU without losing mapping accuracy. mm2-ax executable is made available at: https://doi.org/10.5281/zenodo.6374533 .