Indexing highly repetitive string collections, part II: Compressed indexes
G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …
represent them within their compressed space while at the same time offering indexed …
Towards a definitive measure of repetitiveness
Unlike in statistical compression, where Shannon's entropy is a definitive lower bound, no
such clear measure exists for the compressibility of repetitive sequences. Since statistical …
such clear measure exists for the compressibility of repetitive sequences. Since statistical …
Optimal-time dictionary-compressed indexes
AR Christiansen, MB Ettienne, T Kociumaka… - ACM Transactions on …, 2020 - dl.acm.org
We describe the first self-indexes able to count and locate pattern occurrences in optimal
time within a space bounded by the size of the most popular dictionary compressors. To …
time within a space bounded by the size of the most popular dictionary compressors. To …
Near-Optimal Search Time in -Optimal Space, and Vice Versa
Two recent lower bounds on the compressibility of repetitive sequences, δ≤ γ, have
received much attention. It has been shown that a length-n string S over an alphabet of size …
received much attention. It has been shown that a length-n string S over an alphabet of size …
Computing MEMs and Relatives on Repetitive Text Collections
G Navarro - arXiv preprint arXiv:2210.09914, 2022 - arxiv.org
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given
pattern $ P [1.. m] $ on a large repetitive text collection $ T [1.. n] $, which is represented as a …
pattern $ P [1.. m] $ on a large repetitive text collection $ T [1.. n] $, which is represented as a …
Toward a definitive compressibility measure for repetitive sequences
T Kociumaka, G Navarro… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
While the th order empirical entropy is an accepted measure of the compressibility of
individual sequences on classical text collections, it is useful only for small values of and …
individual sequences on classical text collections, it is useful only for small values of and …
Near-optimal quantum algorithms for bounded edit distance and lempel-ziv factorization
Measuring sequence similarity and compressing texts are among the most fundamental
tasks in string algorithms. In this work, we develop near-optimal quantum algorithms for the …
tasks in string algorithms. In this work, we develop near-optimal quantum algorithms for the …
On repetitiveness measures of Thue-Morse words
K Kutsukake, T Matsumoto, Y Nakashima… - … Symposium on String …, 2020 - Springer
We show that the size γ (t_n) of the smallest string attractor of the n-th Thue-Morse word t_n
is 4 for any n ≥ 4, disproving the conjecture by Mantaci et al. ICTCS 2019 that it is n. We …
is 4 for any n ≥ 4, disproving the conjecture by Mantaci et al. ICTCS 2019 that it is n. We …
Sublinear time Lempel-Ziv (LZ77) factorization
J Ellert - International Symposium on String Processing and …, 2023 - Springer
Abstract The Lempel-Ziv (LZ77) factorization of a string is a widely-used algorithmic tool that
plays a central role in data compression and indexing. For a length-n string over integer …
plays a central role in data compression and indexing. For a length-n string over integer …
Reordering Functions in Mobiles Apps for Reduced Size and Faster Start-Up
Function layout, also known as function reordering or function placement, is one of the most
effective profile-guided compiler optimizations. By reordering functions in a binary, compilers …
effective profile-guided compiler optimizations. By reordering functions in a binary, compilers …