Data structures based on k-mers for querying large collections of sequencing data sets
High-throughput sequencing data sets are usually deposited in public repositories (eg, the
European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached …
European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached …
Creating and using minimizer sketches in computational genomics
Processing large data sets has become an essential part of computational genomics.
Greatly increased availability of sequence data from multiple sources has fueled …
Greatly increased availability of sequence data from multiple sources has fueled …
[HTML][HTML] Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs
Memory consumption of de Bruijn graphs is often prohibitive. Most de Bruijn graph-based
assemblers reduce the complexity by compacting paths into single vertices, but this is …
assemblers reduce the complexity by compacting paths into single vertices, but this is …
Sparse and skew hashing of k-mers
GE Pibiri - Bioinformatics, 2022 - academic.oup.com
Motivation A dictionary of k-mers is a data structure that stores a set of n distinct k-mers and
supports membership queries. This data structure is at the hearth of many important tasks in …
supports membership queries. This data structure is at the hearth of many important tasks in …
REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets
Motivation In this work we present REINDEER, a novel computational method that performs
indexing of sequences and records their abundances across a collection of datasets. To the …
indexing of sequences and records their abundances across a collection of datasets. To the …
[HTML][HTML] Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2
The de Bruijn graph is a key data structure in modern computational genomics, and
construction of its compacted variant resides upstream of many genomic analyses. As the …
construction of its compacted variant resides upstream of many genomic analyses. As the …
Representation of k-Mer Sets Using Spectrum-Preserving String Sets
A Rahman, P Medevedev - Journal of Computational Biology, 2021 - liebertpub.com
Given the popularity and elegance of k-mer-based tools, finding a space-efficient way to
represent a set of k-mers is important for improving the scalability of bioinformatics analyses …
represent a set of k-mers is important for improving the scalability of bioinformatics analyses …
Small Searchable κ-Spectra via Subset Rank Queries on the Spectral Burrows-Wheeler Transform
JN Alanko, SJ Puglisi, J Vuohtoniemi - SIAM Conference on Applied and …, 2023 - SIAM
The κ-spectrum of a string is the set of all distinct substrings of length κ occurring in the
string. This is a lossy but computationally convenient representation of the information in the …
string. This is a lossy but computationally convenient representation of the information in the …
Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k-mer sets
In this article, we introduce the Conway–Bromage–Lyndon (CBL) structure, a compressed,
dynamic and exact method for representing k-mer sets. Originating from Conway and …
dynamic and exact method for representing k-mer sets. Originating from Conway and …
Efficient minimizer orders for large values of k using minimum decycling sets
Minimizers are ubiquitously used in data structures and algorithms for efficient searching,
mapping, and indexing of high-throughput DNA sequencing data. Minimizer schemes select …
mapping, and indexing of high-throughput DNA sequencing data. Minimizer schemes select …