Metagraph: Indexing and analysing nucleotide archives at petabase-scale

M Karasikov, H Mustafa, D Danciu, C Barber… - BioRxiv, 2020 - biorxiv.org
The amount of biological sequencing data available in public repositories is growing
exponentially, forming an invaluable biomedical research resource. Yet, making all this …

Meta-colored compacted de Bruijn graphs

GE Pibiri, J Fan, R Patro - International Conference on Research in …, 2024 - Springer
The colored compacted de Bruijn graph (c-dBG) has become a fundamental tool used
across several areas of genomics and pangenomics. For example, it has been widely …

[HTML][HTML] kmerDB: a database encompassing the set of genomic and proteomic sequence information for each species

I Mouratidis, FA Baltoumas, N Chantzi… - Computational and …, 2024 - Elsevier
The decrease in sequencing expenses has facilitated the creation of reference genomes
and proteomes for an expanding array of organisms. Nevertheless, no established …

Label-guided seed-chain-extend alignment on annotated De Bruijn graphs

H Mustafa, M Karasikov, N Mansouri Ghiasi… - …, 2024 - academic.oup.com
Motivation Exponential growth in sequencing databases has motivated scalable De Bruijn
graph-based (DBG) indexing for searching these data, using annotations to label nodes with …

Designing efficient randstrobes for sequence similarity analyses

M Karami, A Soltani Mohammadi, M Martin… - …, 2024 - academic.oup.com
Motivation Substrings of length k, commonly referred to as k-mers, play a vital role in
sequence analysis. However, k-mers are limited to exact matches between sequences …

MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing

NM Ghiasi, M Sadrosadati, H Mustafa… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Metagenomics, the study of the genome sequences of diverse organisms in a common
environment, has led to significant advances in many fields. Since the species present in a …

MetaGraph-MLA: Label-guided alignment to variable-order De Bruijn graphs

H Mustafa, M Karasikov, G Rätsch, A Kahles - bioRxiv, 2022 - biorxiv.org
The amount of data stored in genomic sequence databases is growing exponentially, far
exceeding traditional indexing strategies' processing capabilities. Many recent indexing …

Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets

I Martayan, B Cazaux, A Limasset, C Marchet - bioRxiv, 2024 - biorxiv.org
In this paper, we introduce the Conway-Bromage-Lyndon (CBL) structure, a compressed,
dynamic and exact method for representing k-mer sets. Originating from Conway and …

[HTML][HTML] Movi: a fast and cache-efficient full-text pangenome index

M Zakeri, NK Brown, OY Ahmed, T Gagie, B Langmead - bioRxiv, 2023 - ncbi.nlm.nih.gov
Efficient pangenome indexes are promising tools for many applications, including rapid
classification of nanopore sequencing reads. Recently, a compressed-index data structure …

Where the patterns are: repetition-aware compression for colored de Bruijn graphs

A Campanelli, GE Pibiri, J Fan, R Patro - bioRxiv, 2024 - biorxiv.org
We describe lossless compressed data structures for the colored de Bruijn graph (or, c-
dBG). Given a collection of reference sequences, a c-dBG can be essentially regarded as a …