REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets

C Marchet, C Boucher, SJ Puglisi, P Medvedev… - Genome …, 2021 - genome.cshlp.org

High-throughput sequencing data sets are usually deposited in public repositories (eg, the
European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached …

被引用次数：99 相关文章所有 18 个版本

[PDF] liebertpub.com

Creating and using minimizer sketches in computational genomics

H Zheng, G Marçais, C Kingsford - Journal of Computational …, 2023 - liebertpub.com

Processing large data sets has become an essential part of computational genomics.
Greatly increased availability of sequence data from multiple sources has fueled …

被引用次数：6 相关文章所有 5 个版本

[PDF] cshlp.org Free from Publisher

Effective sequence similarity detection with strobemers

K Sahlin - Genome research, 2021 - genome.cshlp.org

k-mer-based methods are widely used in bioinformatics for various types of sequence
comparisons. However, a single mutation will mutate k consecutive k-mers and make most k …

被引用次数：52 相关文章所有 7 个版本

[PDF] springer.com

Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2

J Khan, M Kokot, S Deorowicz, R Patro - Genome biology, 2022 - Springer

The de Bruijn graph is a key data structure in modern computational genomics, and
construction of its compacted variant resides upstream of many genomic analyses. As the …

被引用次数：26 相关文章所有 15 个版本

[PDF] cshlp.org Free from Publisher

Lossless indexing with counting de Bruijn graphs

M Karasikov, H Mustafa, G Rätsch, A Kahles - Genome Research, 2022 - genome.cshlp.org

Sequencing data are rapidly accumulating in public repositories. Making this resource
accessible for interactive analysis at scale requires efficient approaches for its storage and …

被引用次数：21 相关文章所有 12 个版本

[PDF] oup.com Full View

Kmtricks: efficient and flexible construction of bloom filters for large sequencing data collections

T Lemane, P Medvedev, R Chikhi… - Bioinformatics …, 2022 - academic.oup.com

When indexing large collections of short-read sequencing data, a common operation that
has now been implemented in several tools (Sequence Bloom Trees and variants, BIGSI) is …

被引用次数：33 相关文章所有 16 个版本

[PDF] springer.com

Simplitigs as an efficient and scalable representation of de Bruijn graphs

K Břinda, M Baym, G Kucherov - Genome biology, 2021 - Springer

Abstract de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal
scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable …

被引用次数：42 相关文章所有 23 个版本

[PDF] oup.com

BLight: efficient exact associative structure for k-mers

C Marchet, M Kerbiriou, A Limasset - Bioinformatics, 2021 - academic.oup.com

Motivation A plethora of methods and applications share the fundamental need to associate
information to words for high-throughput sequence analysis. Doing so for billions of k-mers …

被引用次数：28 相关文章所有 7 个版本

[PDF] oup.com

Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k-mer sets

I Martayan, B Cazaux, A Limasset, C Marchet - Bioinformatics, 2024 - academic.oup.com

In this article, we introduce the Conway–Bromage–Lyndon (CBL) structure, a compressed,
dynamic and exact method for representing k-mer sets. Originating from Conway and …

被引用次数：3 相关文章所有 3 个版本

[PDF] oup.com

MetaProFi: an ultrafast chunked Bloom filter for storing and querying protein and nucleotide sequence data for accurate identification of functionally relevant genetic …

SK Srikakulam, S Keller, F Dabbaghie, R Bals… - …, 2023 - academic.oup.com

Motivation Bloom filters are a popular data structure that allows rapid searches in large
sequence datasets. So far, all tools work with nucleotide sequences; however, protein …

被引用次数：7 相关文章所有 9 个版本