Indexing highly repetitive string collections, part II: Compressed indexes
G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …
represent them within their compressed space while at the same time offering indexed …
Fully functional suffix trees and optimal text searching in BWT-runs bounded space
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …
versioned text collections—has become an important problem since the turn of the …
[HTML][HTML] Wavelet trees for all
G Navarro - Journal of Discrete Algorithms, 2014 - Elsevier
The wavelet tree is a versatile data structure that serves a number of purposes, from string
processing to computational geometry. It can be regarded as a device that represents a …
processing to computational geometry. It can be regarded as a device that represents a …
Resolution of the burrows-wheeler transform conjecture
D Kempa, T Kociumaka - Communications of the ACM, 2022 - dl.acm.org
Abstract The Burrows-Wheeler Transform (BWT) is an invertible text transformation that
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …
Optimal-time text indexing in BWT-runs bounded space
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …
versioned text collections—has become an important problem since the turn of the …
On compressing and indexing repetitive sequences
S Kreft, G Navarro - Theoretical Computer Science, 2013 - Elsevier
We introduce LZ-End, a new member of the Lempel–Ziv family of text compressors, which
achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary …
achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary …
An upper bound and linear-space queries on the LZ-End parsing
Lempel–Ziv (LZ77) compression is the most commonly used lossless compression
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …
LZ77-like compression with fast random access
S Kreft, G Navarro - 2010 Data Compression Conference, 2010 - ieeexplore.ieee.org
We introduce an alternative Lempel-Ziv text parsing, LZ-End, that converges to the entropy
and in practice gets very close to LZ77. LZ-End forces sources to finish at the end of a …
and in practice gets very close to LZ77. LZ-End forces sources to finish at the end of a …
Self-indexed grammar-based compression
Self-indexes aim at representing text collections in a compressed format that allows
extracting arbitrary portions and also offers indexed searching on the collection. Current self …
extracting arbitrary portions and also offers indexed searching on the collection. Current self …
[HTML][HTML] Sensitivity of string compressors and repetitiveness measures
T Akagi, M Funakoshi, S Inenaga - Information and Computation, 2023 - Elsevier
The sensitivity of a string compression algorithm C asks how much the output size C (T) for
an input string T can increase when a single character edit operation is performed on T. This …
an input string T can increase when a single character edit operation is performed on T. This …