A benchmark study of k-mer counting methods for high-throughput sequencing
SC Manekar, SR Sathe - GigaScience, 2018 - academic.oup.com
The rapid development of high-throughput sequencing technologies means that hundreds of
gigabytes of sequencing data can be produced in a single study. Many bioinformatics tools …
gigabytes of sequencing data can be produced in a single study. Many bioinformatics tools …
KMC 3: counting and manipulating k-mer statistics
Counting all k-mers in a given dataset is a standard procedure in many bioinformatics
applications. We introduce KMC3, a significant improvement of the former KMC2 algorithm …
applications. We introduce KMC3, a significant improvement of the former KMC2 algorithm …
Data compression for sequencing data
S Deorowicz, S Grabowski - Algorithms for Molecular Biology, 2013 - Springer
Post-Sanger sequencing methods produce tons of data, and there is a generalagreement
that the challenge to store and process them must be addressedwith data compression. In …
that the challenge to store and process them must be addressedwith data compression. In …
KMC 2: fast and resource-frugal k-mer counting
S Deorowicz, M Kokot, S Grabowski… - …, 2015 - academic.oup.com
Motivation: Building the histogram of occurrences of every k-symbol long substring of
nucleotide data is a standard step in many bioinformatics applications, known under the …
nucleotide data is a standard step in many bioinformatics applications, known under the …
IVA: accurate de novo assembly of RNA virus genomes
Motivation: An accurate genome assembly from short read sequencing data is critical for
downstream analysis, for example allowing investigation of variants within a sequenced …
downstream analysis, for example allowing investigation of variants within a sequenced …
BLESS: bloom filter-based error correction solution for high-throughput sequencing reads
Motivation: Rapid advances in next-generation sequencing (NGS) technology have led to
exponential increase in the amount of genomic information. However, NGS reads contain far …
exponential increase in the amount of genomic information. However, NGS reads contain far …
Gerbil: a fast and memory-efficient k-mer counter with GPU-support
M Erbert, S Rechner, M Müller-Hannemann - Algorithms for Molecular …, 2017 - Springer
Background A basic task in bioinformatics is the counting of k-mers in genome sequences.
Existing k-mer counting tools are most often optimized for small k< 32 and suffer from …
Existing k-mer counting tools are most often optimized for small k< 32 and suffer from …
Simplitigs as an efficient and scalable representation of de Bruijn graphs
Abstract de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal
scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable …
scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable …
These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure
K-mer abundance analysis is widely used for many purposes in nucleotide sequence
analysis, including data preprocessing for de novo assembly, repeat detection, and …
analysis, including data preprocessing for de novo assembly, repeat detection, and …
Turtle: Identifying frequent k -mers with cache-efficient algorithms
RS Roy, D Bhattacharya, A Schliep - Bioinformatics, 2014 - academic.oup.com
Motivation: Counting the frequencies of k-mers in read libraries is often a first step in the
analysis of high-throughput sequencing data. Infrequent k-mers are assumed to be a result …
analysis of high-throughput sequencing data. Infrequent k-mers are assumed to be a result …