Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Dynamic suffix array with polylogarithmic queries and updates

D Kempa, T Kociumaka - Proceedings of the 54th Annual ACM SIGACT …, 2022 - dl.acm.org
The suffix array SA [1.. n] of a text T of length n is a permutation of {1,…, n} describing the
lexicographical ordering of suffixes of T and is considered to be one of the most important …

Searching and indexing genomic databases via kernelization

T Gagie, SJ Puglisi - Frontiers in Bioengineering and Biotechnology, 2015 - frontiersin.org
The rapid advance of DNA sequencing technologies has yielded databases of thousands of
genomes. To search and index these databases effectively, it is important that we take …

An upper bound and linear-space queries on the LZ-End parsing

D Kempa, B Saha - Proceedings of the 2022 Annual ACM-SIAM …, 2022 - SIAM
Lempel–Ziv (LZ77) compression is the most commonly used lossless compression
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …

[HTML][HTML] Dynamic index and LZ factorization in compressed space

T Nishimoto, I Tomohiro, S Inenaga, H Bannai… - Discrete Applied …, 2020 - Elsevier
In this paper, we propose a new dynamic compressed index of O (w) space for a dynamic
text T, where w= O (min (z log N log∗ M, N)) is the size of the signature encoding of T, z is …

A space-optimal grammar compression

Y Takabatake, H Sakamoto - 25th Annual European …, 2017 - drops.dagstuhl.de
A grammar compression is a context-free grammar (CFG) deriving a single string
deterministically. For an input string of length N over an alphabet of size sigma, the smallest …

Indexing highly repetitive string collections

G Navarro - arXiv preprint arXiv:2004.02781, 2020 - arxiv.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Grammar-compressed self-index with Lyndon words

K Tsuruta, D Köppl, Y Nakashima, S Inenaga… - arXiv preprint arXiv …, 2020 - arxiv.org
We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired
by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data …

Grammar index by induced suffix sorting

T Akagi, D Köppl, Y Nakashima, S Inenaga… - String Processing and …, 2021 - Springer
We propose a new compressed text index built upon a grammar compression based on
induced suffix sorting Nunes et al., DCC'18. We show that this grammar exhibits a locality …

Linear-size CDAWG: New repetition-aware indexing and grammar compression

T Takagi, K Goto, Y Fujishige, S Inenaga… - … Symposium on String …, 2017 - Springer
In this paper, we propose a novel approach to combine compact directed acyclic word
graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index …