Genome sequencing is one of the most important applications in healthcare and has a great potential to realize precision medicine and personalized healthcare. However, its computing process is very time consuming. Even pre-processing the raw sequence data of a whole genome for a single person to the analysis ready data can take several days on a single-core CPU.
In this paper, we propose to accelerate the performance of the widely used Genome Analysis ToolKit (GATK) using FPGAs. More specifically, we focus on the algorithm and hardware codesign for the Base Quality Score Re-calibration (BQSR) step in GATK, which is an important and time-consuming step to correct systematic errors made by a sequencing machine. Prior studies did not consider hardware acceleration for BQSR because it requires a large amount of memory with random access and has a lot of control flow. To address these challenges, we first adapt the algorithm to resolve the random memory access conflicts to achieve a fully pipelined accelerator design and reduce its dataset size. Second, we leverage the newly introduced large-capacity UltraRAM (URAM) in Xilinx UltraScale+ FPGAs to buffer BQSR’s large dataset on chip, and further optimize its operating frequency. Finally, we also explore the coarse-grained pipeline and parallelism to improve the overall performance of the BQSR accelerator. Compared to the latest software implementation of GATK 4.1 running on single-thread and 56-thread CPUs (14nm Xeon E5-2680 v4), our FPGA accelerator running on Xilinx 16nm UltraScale+ VCU1525 board achieves up to 40.7x and 8.5x speedups, respectively.