Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Fully functional suffix trees and optimal text searching in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Journal of the ACM (JACM), 2020 - dl.acm.org
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

At the roots of dictionary compression: string attractors

D Kempa, N Prezza - Proceedings of the 50th Annual ACM SIGACT …, 2018 - dl.acm.org
A well-known fact in the field of lossless text compression is that high-order entropy is a
weak model when the input contains long repetitions. Motivated by this fact, decades of …

Optimal-time text indexing in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Proceedings of the Twenty-Ninth Annual ACM …, 2018 - SIAM
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

Towards a definitive measure of repetitiveness

T Kociumaka, G Navarro, N Prezza - Latin American Symposium on …, 2020 - Springer
Unlike in statistical compression, where Shannon's entropy is a definitive lower bound, no
such clear measure exists for the compressibility of repetitive sequences. Since statistical …

Balancing straight-line programs

M Ganardi, A Jeż, M Lohrey - Journal of the ACM (JACM), 2021 - dl.acm.org
We show that a context-free grammar of size that produces a single string of length (such a
grammar is also called a string straight-line program) can be transformed in linear time into a …

Toward a definitive compressibility measure for repetitive sequences

T Kociumaka, G Navarro… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
While the th order empirical entropy is an accepted measure of the compressibility of
individual sequences on classical text collections, it is useful only for small values of and …

[HTML][HTML] Universal compressed text indexing

G Navarro, N Prezza - Theoretical Computer Science, 2019 - Elsevier
The rise of repetitive datasets has lately generated a lot of interest in compressed self-
indexes based on dictionary compression, a rich and heterogeneous family of techniques …

Grammar-compressed indexes with logarithmic search time

F Claude, G Navarro, A Pacheco - Journal of Computer and System …, 2021 - Elsevier
Abstract Let a text T [1.. n] be the only string generated by a context-free grammar with g
(terminal and nonterminal) symbols, and of size G (measured as the sum of the lengths of …

Block trees

D Belazzougui, M Cáceres, T Gagie… - Journal of Computer and …, 2021 - Elsevier
Abstract Let string S [1.. n] be parsed into z phrases by the Lempel-Ziv algorithm. The
corresponding compression algorithm encodes S in O (z) space, but it does not support …