Indexing highly repetitive string collections, part II: Compressed indexes
G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …
represent them within their compressed space while at the same time offering indexed …
Fully functional suffix trees and optimal text searching in BWT-runs bounded space
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …
versioned text collections—has become an important problem since the turn of the …
Toward a definitive compressibility measure for repetitive sequences
T Kociumaka, G Navarro… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
While the th order empirical entropy is an accepted measure of the compressibility of
individual sequences on classical text collections, it is useful only for small values of and …
individual sequences on classical text collections, it is useful only for small values of and …
On stricter reachable repetitiveness measures
The size b of the smallest bidirectional macro scheme, which is arguably the most general
copy-paste scheme to generate a given sequence, is considered to be the strictest …
copy-paste scheme to generate a given sequence, is considered to be the strictest …
Iterated straight-line programs
We explore an extension to straight-line programs (SLPs) that outperforms, for some text
families, the measure δ based on substring complexity, a lower bound for most measures …
families, the measure δ based on substring complexity, a lower bound for most measures …
On the impact of morphisms on BWT-Runs
Morphisms are widely studied combinatorial objects that can be used for generating infinite
families of words. In the context of Information theory, injective morphisms are called …
families of words. In the context of Information theory, injective morphisms are called …
Height-bounded Lempel-Ziv encodings
We introduce height-bounded LZ encodings (LZHB), a new family of compressed
representations that is a variant of Lempel-Ziv parsings with a focus on allowing fast access …
representations that is a variant of Lempel-Ziv parsings with a focus on allowing fast access …
Computing np-hard repetitiveness measures via MAX-SAT
Repetitiveness measures reveal profound characteristics of datasets, and give rise to
compressed data structures and algorithms working in compressed space. Alas, the …
compressed data structures and algorithms working in compressed space. Alas, the …
On the hardness of smallest RLSLPs and collage systems
A Kawamoto, I Tomohiro, D Köppl… - 2024 Data …, 2024 - ieeexplore.ieee.org
On the Hardness of Smallest RLSLPs and Collage Systems Page 1 On the Hardness of
Smallest RLSLPs and Collage Systems Akiyoshi Kawamoto†, Tomohiro I†, Dominik Köppl∗,+ …
Smallest RLSLPs and Collage Systems Akiyoshi Kawamoto†, Tomohiro I†, Dominik Köppl∗,+ …
LZRR: LZ77 parsing with right reference
T Nishimoto, Y Tabei - Information and Computation, 2022 - Elsevier
Lossless data compression has been widely studied in computer science. One of the most
widely used lossless data compressions is Lempel-Ziv (LZ) 77 parsing, which achieves a …
widely used lossless data compressions is Lempel-Ziv (LZ) 77 parsing, which achieves a …