Indexing highly repetitive string collections, part II: Compressed indexes
G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …
represent them within their compressed space while at the same time offering indexed …
Fully functional suffix trees and optimal text searching in BWT-runs bounded space
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …
versioned text collections—has become an important problem since the turn of the …
At the roots of dictionary compression: string attractors
A well-known fact in the field of lossless text compression is that high-order entropy is a
weak model when the input contains long repetitions. Motivated by this fact, decades of …
weak model when the input contains long repetitions. Motivated by this fact, decades of …
Resolution of the burrows-wheeler transform conjecture
D Kempa, T Kociumaka - Communications of the ACM, 2022 - dl.acm.org
Abstract The Burrows-Wheeler Transform (BWT) is an invertible text transformation that
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …
Optimal-time text indexing in BWT-runs bounded space
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …
versioned text collections—has become an important problem since the turn of the …
Towards a definitive measure of repetitiveness
Unlike in statistical compression, where Shannon's entropy is a definitive lower bound, no
such clear measure exists for the compressibility of repetitive sequences. Since statistical …
such clear measure exists for the compressibility of repetitive sequences. Since statistical …
Optimal-time dictionary-compressed indexes
AR Christiansen, MB Ettienne, T Kociumaka… - ACM Transactions on …, 2020 - dl.acm.org
We describe the first self-indexes able to count and locate pattern occurrences in optimal
time within a space bounded by the size of the most popular dictionary compressors. To …
time within a space bounded by the size of the most popular dictionary compressors. To …
[HTML][HTML] Refining the r-index
Abstract Gagie, Navarro and Prezza's r-index (SODA, 2018) promises to speed up DNA
alignment and variation calling by allowing us to index entire genomic databases, provided …
alignment and variation calling by allowing us to index entire genomic databases, provided …
Toward a definitive compressibility measure for repetitive sequences
T Kociumaka, G Navarro… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
While the th order empirical entropy is an accepted measure of the compressibility of
individual sequences on classical text collections, it is useful only for small values of and …
individual sequences on classical text collections, it is useful only for small values of and …
[HTML][HTML] Universal compressed text indexing
The rise of repetitive datasets has lately generated a lot of interest in compressed self-
indexes based on dictionary compression, a rich and heterogeneous family of techniques …
indexes based on dictionary compression, a rich and heterogeneous family of techniques …