Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Data Structures to Represent a Set of k-long DNA Sequences

R Chikhi, J Holub, P Medvedev - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
The analysis of biological sequencing data has been one of the biggest applications of
string algorithms. The approaches used in many such applications are based on the …

Fully functional suffix trees and optimal text searching in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Journal of the ACM (JACM), 2020 - dl.acm.org
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

MONI: a pangenomic index for finding maximal exact matches

M Rossi, M Oliva, B Langmead, T Gagie… - Journal of …, 2022 - liebertpub.com
Recently, Gagie et al. proposed a version of the FM-index, called the r-index, that can store
thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to …

Resolution of the burrows-wheeler transform conjecture

D Kempa, T Kociumaka - Communications of the ACM, 2022 - dl.acm.org
Abstract The Burrows-Wheeler Transform (BWT) is an invertible text transformation that
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …

Recycler: an algorithm for detecting plasmids from de novo assembly graphs

R Rozov, A Brown Kav, D Bogumil, N Shterzer… - …, 2017 - academic.oup.com
Motivation Plasmids and other mobile elements are central contributors to microbial
evolution and genome innovation. Recently, they have been found to have important roles in …

Optimal-time text indexing in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Proceedings of the Twenty-Ninth Annual ACM …, 2018 - SIAM
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

[HTML][HTML] Wheeler graphs: A framework for BWT-based data structures

T Gagie, G Manzini, J Sirén - Theoretical computer science, 2017 - Elsevier
Abstract The famous Burrows–Wheeler Transform (BWT) was originally defined for a single
string but variations have been developed for sets of strings, labeled trees, de Bruijn graphs …

A learned approach to design compressed rank/select data structures

A Boffa, P Ferragina, G Vinciguerra - ACM Transactions on Algorithms …, 2022 - dl.acm.org
We address the problem of designing, implementing, and experimenting with compressed
data structures that support rank and select queries over a dictionary of integers. We shine a …

Dynamic suffix array with polylogarithmic queries and updates

D Kempa, T Kociumaka - Proceedings of the 54th Annual ACM SIGACT …, 2022 - dl.acm.org
The suffix array SA [1.. n] of a text T of length n is a permutation of {1,…, n} describing the
lexicographical ordering of suffixes of T and is considered to be one of the most important …