A benchmark study of k-mer counting methods for high-throughput sequencing

SC Manekar, SR Sathe - GigaScience, 2018 - academic.oup.com
The rapid development of high-throughput sequencing technologies means that hundreds of
gigabytes of sequencing data can be produced in a single study. Many bioinformatics tools …

KMC 3: counting and manipulating k-mer statistics

M Kokot, M Długosz, S Deorowicz - Bioinformatics, 2017 - academic.oup.com
Counting all k-mers in a given dataset is a standard procedure in many bioinformatics
applications. We introduce KMC3, a significant improvement of the former KMC2 algorithm …

Data compression for sequencing data

S Deorowicz, S Grabowski - Algorithms for Molecular Biology, 2013 - Springer
Post-Sanger sequencing methods produce tons of data, and there is a generalagreement
that the challenge to store and process them must be addressedwith data compression. In …

KMC 2: fast and resource-frugal k-mer counting

S Deorowicz, M Kokot, S Grabowski… - …, 2015 - academic.oup.com
Motivation: Building the histogram of occurrences of every k-symbol long substring of
nucleotide data is a standard step in many bioinformatics applications, known under the …

IVA: accurate de novo assembly of RNA virus genomes

M Hunt, A Gall, SH Ong, J Brener, B Ferns… - …, 2015 - academic.oup.com
Motivation: An accurate genome assembly from short read sequencing data is critical for
downstream analysis, for example allowing investigation of variants within a sequenced …

BLESS: bloom filter-based error correction solution for high-throughput sequencing reads

Y Heo, XL Wu, D Chen, J Ma, WM Hwu - bioinformatics, 2014 - academic.oup.com
Motivation: Rapid advances in next-generation sequencing (NGS) technology have led to
exponential increase in the amount of genomic information. However, NGS reads contain far …

Gerbil: a fast and memory-efficient k-mer counter with GPU-support

M Erbert, S Rechner, M Müller-Hannemann - Algorithms for Molecular …, 2017 - Springer
Background A basic task in bioinformatics is the counting of k-mers in genome sequences.
Existing k-mer counting tools are most often optimized for small k< 32 and suffer from …

Simplitigs as an efficient and scalable representation of de Bruijn graphs

K Břinda, M Baym, G Kucherov - Genome biology, 2021 - Springer
Abstract de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal
scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable …

These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure

Q Zhang, J Pell, R Canino-Koning, AC Howe… - PloS one, 2014 - journals.plos.org
K-mer abundance analysis is widely used for many purposes in nucleotide sequence
analysis, including data preprocessing for de novo assembly, repeat detection, and …

Turtle: Identifying frequent k -mers with cache-efficient algorithms

RS Roy, D Bhattacharya, A Schliep - Bioinformatics, 2014 - academic.oup.com
Motivation: Counting the frequencies of k-mers in read libraries is often a first step in the
analysis of high-throughput sequencing data. Infrequent k-mers are assumed to be a result …