Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Fully functional suffix trees and optimal text searching in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Journal of the ACM (JACM), 2020 - dl.acm.org
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

Toward a definitive compressibility measure for repetitive sequences

T Kociumaka, G Navarro… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
While the th order empirical entropy is an accepted measure of the compressibility of
individual sequences on classical text collections, it is useful only for small values of and …

On stricter reachable repetitiveness measures

G Navarro, C Urbina - International Symposium on String Processing and …, 2021 - Springer
The size b of the smallest bidirectional macro scheme, which is arguably the most general
copy-paste scheme to generate a given sequence, is considered to be the strictest …

Iterated straight-line programs

G Navarro, C Urbina - Latin American Symposium on Theoretical …, 2024 - Springer
We explore an extension to straight-line programs (SLPs) that outperforms, for some text
families, the measure δ based on substring complexity, a lower bound for most measures …

On the impact of morphisms on BWT-Runs

G Fici, G Romana, M Sciortino… - 34th Annual Symposium …, 2023 - drops.dagstuhl.de
Morphisms are widely studied combinatorial objects that can be used for generating infinite
families of words. In the context of Information theory, injective morphisms are called …

Height-bounded Lempel-Ziv encodings

H Bannai, M Funakoshi, D Hendrian… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce height-bounded LZ encodings (LZHB), a new family of compressed
representations that is a variant of Lempel-Ziv parsings with a focus on allowing fast access …

Computing np-hard repetitiveness measures via MAX-SAT

H Bannai, K Goto, M Ishihata, S Kanda, D Köppl… - arXiv preprint arXiv …, 2022 - arxiv.org
Repetitiveness measures reveal profound characteristics of datasets, and give rise to
compressed data structures and algorithms working in compressed space. Alas, the …

On the hardness of smallest RLSLPs and collage systems

A Kawamoto, I Tomohiro, D Köppl… - 2024 Data …, 2024 - ieeexplore.ieee.org
On the Hardness of Smallest RLSLPs and Collage Systems Page 1 On the Hardness of
Smallest RLSLPs and Collage Systems Akiyoshi Kawamoto†, Tomohiro I†, Dominik Köppl∗,+ …

LZRR: LZ77 parsing with right reference

T Nishimoto, Y Tabei - Information and Computation, 2022 - Elsevier
Lossless data compression has been widely studied in computer science. One of the most
widely used lossless data compressions is Lempel-Ziv (LZ) 77 parsing, which achieves a …