Indexing highly repetitive string collections, part II: Compressed indexes
G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …
represent them within their compressed space while at the same time offering indexed …
Fully functional suffix trees and optimal text searching in BWT-runs bounded space
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …
versioned text collections—has become an important problem since the turn of the …
At the roots of dictionary compression: string attractors
A well-known fact in the field of lossless text compression is that high-order entropy is a
weak model when the input contains long repetitions. Motivated by this fact, decades of …
weak model when the input contains long repetitions. Motivated by this fact, decades of …
Resolution of the burrows-wheeler transform conjecture
D Kempa, T Kociumaka - Communications of the ACM, 2022 - dl.acm.org
Abstract The Burrows-Wheeler Transform (BWT) is an invertible text transformation that
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …
Optimal-time text indexing in BWT-runs bounded space
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …
versioned text collections—has become an important problem since the turn of the …
Towards a definitive measure of repetitiveness
Unlike in statistical compression, where Shannon's entropy is a definitive lower bound, no
such clear measure exists for the compressibility of repetitive sequences. Since statistical …
such clear measure exists for the compressibility of repetitive sequences. Since statistical …
Dynamic suffix array with polylogarithmic queries and updates
D Kempa, T Kociumaka - Proceedings of the 54th Annual ACM SIGACT …, 2022 - dl.acm.org
The suffix array SA [1.. n] of a text T of length n is a permutation of {1,…, n} describing the
lexicographical ordering of suffixes of T and is considered to be one of the most important …
lexicographical ordering of suffixes of T and is considered to be one of the most important …
An upper bound and linear-space queries on the LZ-End parsing
Lempel–Ziv (LZ77) compression is the most commonly used lossless compression
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …
[HTML][HTML] Sensitivity of string compressors and repetitiveness measures
T Akagi, M Funakoshi, S Inenaga - Information and Computation, 2023 - Elsevier
The sensitivity of a string compression algorithm C asks how much the output size C (T) for
an input string T can increase when a single character edit operation is performed on T. This …
an input string T can increase when a single character edit operation is performed on T. This …
[HTML][HTML] Universal compressed text indexing
The rise of repetitive datasets has lately generated a lot of interest in compressed self-
indexes based on dictionary compression, a rich and heterogeneous family of techniques …
indexes based on dictionary compression, a rich and heterogeneous family of techniques …