Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Fully functional suffix trees and optimal text searching in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Journal of the ACM (JACM), 2020 - dl.acm.org
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

[HTML][HTML] Wavelet trees for all

G Navarro - Journal of Discrete Algorithms, 2014 - Elsevier
The wavelet tree is a versatile data structure that serves a number of purposes, from string
processing to computational geometry. It can be regarded as a device that represents a …

Resolution of the burrows-wheeler transform conjecture

D Kempa, T Kociumaka - Communications of the ACM, 2022 - dl.acm.org
Abstract The Burrows-Wheeler Transform (BWT) is an invertible text transformation that
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …

Optimal-time text indexing in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Proceedings of the Twenty-Ninth Annual ACM …, 2018 - SIAM
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

On compressing and indexing repetitive sequences

S Kreft, G Navarro - Theoretical Computer Science, 2013 - Elsevier
We introduce LZ-End, a new member of the Lempel–Ziv family of text compressors, which
achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary …

An upper bound and linear-space queries on the LZ-End parsing

D Kempa, B Saha - Proceedings of the 2022 Annual ACM-SIAM …, 2022 - SIAM
Lempel–Ziv (LZ77) compression is the most commonly used lossless compression
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …

LZ77-like compression with fast random access

S Kreft, G Navarro - 2010 Data Compression Conference, 2010 - ieeexplore.ieee.org
We introduce an alternative Lempel-Ziv text parsing, LZ-End, that converges to the entropy
and in practice gets very close to LZ77. LZ-End forces sources to finish at the end of a …

Self-indexed grammar-based compression

F Claude, G Navarro - Fundamenta Informaticae, 2011 - content.iospress.com
Self-indexes aim at representing text collections in a compressed format that allows
extracting arbitrary portions and also offers indexed searching on the collection. Current self …

[HTML][HTML] Sensitivity of string compressors and repetitiveness measures

T Akagi, M Funakoshi, S Inenaga - Information and Computation, 2023 - Elsevier
The sensitivity of a string compression algorithm C asks how much the output size C (T) for
an input string T can increase when a single character edit operation is performed on T. This …