Martin Luther University Halle-Wittenberg


Arxiv Article
gerbil-2016.pdf (531.2 KB)  vom 25.07.2016

Further settings

Login for editors

Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support

A basic task in bioinformatics is the counting of k-mers in genome strings. The k-mer counting problem is to build a histogram of all substrings of length k in a given genome sequence. We present the open source k-mer counting software Gerbil that has been designed for the efficient counting of k-mers for k32. Given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important. While existing k-mer counting tools suffer from excessive memory resource consumption or degrading performance for large k, Gerbil is able to efficiently support large k without much loss of performance. Our software implements a two-disk approach. In the first step, DNA reads are loaded from disk and distributed to temporary files that are stored at a working disk. In a second step, the temporary files are read again, split into k-mers and counted via a hash table approach. In addition, Gerbil can optionally use GPUs to accelerate the counting step. For large k, we outperform state-of-the-art open source k-mer counting tools for large genome data sets.

A comprehensive description of Gerbil can be found in the arxiv. The source code is located at GitHub. Links to both arxiv and GitHub can be found in the right column . The arxiv-Version of the paper itself is also available for download. Feel free to use our software. For publications, please cite

Erbert, Rechner, and Müller-Hannemann: Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support, Proceedings of WABI 2016.