Indexing highly repetitive string collections, part II: Compressed indexes
G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …
represent them within their compressed space while at the same time offering indexed …
Fully functional suffix trees and optimal text searching in BWT-runs bounded space
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …
versioned text collections—has become an important problem since the turn of the …
At the roots of dictionary compression: string attractors
A well-known fact in the field of lossless text compression is that high-order entropy is a
weak model when the input contains long repetitions. Motivated by this fact, decades of …
weak model when the input contains long repetitions. Motivated by this fact, decades of …
Optimal-time text indexing in BWT-runs bounded space
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …
versioned text collections—has become an important problem since the turn of the …
Towards a definitive measure of repetitiveness
Unlike in statistical compression, where Shannon's entropy is a definitive lower bound, no
such clear measure exists for the compressibility of repetitive sequences. Since statistical …
such clear measure exists for the compressibility of repetitive sequences. Since statistical …
Balancing straight-line programs
We show that a context-free grammar of size that produces a single string of length (such a
grammar is also called a string straight-line program) can be transformed in linear time into a …
grammar is also called a string straight-line program) can be transformed in linear time into a …
Toward a definitive compressibility measure for repetitive sequences
T Kociumaka, G Navarro… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
While the th order empirical entropy is an accepted measure of the compressibility of
individual sequences on classical text collections, it is useful only for small values of and …
individual sequences on classical text collections, it is useful only for small values of and …
[HTML][HTML] Universal compressed text indexing
The rise of repetitive datasets has lately generated a lot of interest in compressed self-
indexes based on dictionary compression, a rich and heterogeneous family of techniques …
indexes based on dictionary compression, a rich and heterogeneous family of techniques …
Grammar-compressed indexes with logarithmic search time
Abstract Let a text T [1.. n] be the only string generated by a context-free grammar with g
(terminal and nonterminal) symbols, and of size G (measured as the sum of the lengths of …
(terminal and nonterminal) symbols, and of size G (measured as the sum of the lengths of …
Block trees
Abstract Let string S [1.. n] be parsed into z phrases by the Lempel-Ziv algorithm. The
corresponding compression algorithm encodes S in O (z) space, but it does not support …
corresponding compression algorithm encodes S in O (z) space, but it does not support …