Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Towards a definitive measure of repetitiveness

T Kociumaka, G Navarro, N Prezza - Latin American Symposium on …, 2020 - Springer
Unlike in statistical compression, where Shannon's entropy is a definitive lower bound, no
such clear measure exists for the compressibility of repetitive sequences. Since statistical …

Optimal-time dictionary-compressed indexes

AR Christiansen, MB Ettienne, T Kociumaka… - ACM Transactions on …, 2020 - dl.acm.org
We describe the first self-indexes able to count and locate pattern occurrences in optimal
time within a space bounded by the size of the most popular dictionary compressors. To …

Near-Optimal Search Time in -Optimal Space, and Vice Versa

T Kociumaka, G Navarro, F Olivares - Algorithmica, 2024 - Springer
Two recent lower bounds on the compressibility of repetitive sequences, δ≤ γ, have
received much attention. It has been shown that a length-n string S over an alphabet of size …

Computing MEMs and Relatives on Repetitive Text Collections

G Navarro - arXiv preprint arXiv:2210.09914, 2022 - arxiv.org
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given
pattern $ P [1.. m] $ on a large repetitive text collection $ T [1.. n] $, which is represented as a …

Toward a definitive compressibility measure for repetitive sequences

T Kociumaka, G Navarro… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
While the th order empirical entropy is an accepted measure of the compressibility of
individual sequences on classical text collections, it is useful only for small values of and …

Near-optimal quantum algorithms for bounded edit distance and lempel-ziv factorization

D Gibney, C Jin, T Kociumaka, SV Thankachan - Proceedings of the 2024 …, 2024 - SIAM
Measuring sequence similarity and compressing texts are among the most fundamental
tasks in string algorithms. In this work, we develop near-optimal quantum algorithms for the …

On repetitiveness measures of Thue-Morse words

K Kutsukake, T Matsumoto, Y Nakashima… - … Symposium on String …, 2020 - Springer
We show that the size γ (t_n) of the smallest string attractor of the n-th Thue-Morse word t_n
is 4 for any n ≥ 4, disproving the conjecture by Mantaci et al. ICTCS 2019 that it is n. We …

Sublinear time Lempel-Ziv (LZ77) factorization

J Ellert - International Symposium on String Processing and …, 2023 - Springer
Abstract The Lempel-Ziv (LZ77) factorization of a string is a widely-used algorithmic tool that
plays a central role in data compression and indexing. For a length-n string over integer …

Reordering Functions in Mobiles Apps for Reduced Size and Faster Start-Up

E Hoag, K Lee, J Mestre, S Pupyrev… - ACM Transactions on …, 2024 - dl.acm.org
Function layout, also known as function reordering or function placement, is one of the most
effective profile-guided compiler optimizations. By reordering functions in a binary, compilers …