Run-length compressed indexes are superior for highly repetitive sequence collections

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org

Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

被引用次数：118 相关文章所有 7 个版本

[PDF] arxiv.org

Fully functional suffix trees and optimal text searching in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Journal of the ACM (JACM), 2020 - dl.acm.org

Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

被引用次数：201 相关文章所有 12 个版本

[HTML] sciencedirect.com

[HTML][HTML] Wavelet trees for all

G Navarro - Journal of Discrete Algorithms, 2014 - Elsevier

The wavelet tree is a versatile data structure that serves a number of purposes, from string
processing to computational geometry. It can be regarded as a device that represents a …

被引用次数：273 相关文章所有 19 个版本

[PDF] acm.org Full View

Resolution of the burrows-wheeler transform conjecture

D Kempa, T Kociumaka - Communications of the ACM, 2022 - dl.acm.org

Abstract The Burrows-Wheeler Transform (BWT) is an invertible text transformation that
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …

被引用次数：99 相关文章所有 10 个版本

[PDF] siam.org

Optimal-time text indexing in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Proceedings of the Twenty-Ninth Annual ACM …, 2018 - SIAM

Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

被引用次数：135 相关文章所有 13 个版本

[PDF] sciencedirect.com

On compressing and indexing repetitive sequences

S Kreft, G Navarro - Theoretical Computer Science, 2013 - Elsevier

We introduce LZ-End, a new member of the Lempel–Ziv family of text compressors, which
achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary …

被引用次数：178 相关文章所有 8 个版本

[PDF] siam.org

An upper bound and linear-space queries on the LZ-End parsing

D Kempa, B Saha - Proceedings of the 2022 Annual ACM-SIAM …, 2022 - SIAM

Lempel–Ziv (LZ77) compression is the most commonly used lossless compression
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …

被引用次数：23 相关文章所有 6 个版本

[PDF] uchile.cl

LZ77-like compression with fast random access

S Kreft, G Navarro - 2010 Data Compression Conference, 2010 - ieeexplore.ieee.org

We introduce an alternative Lempel-Ziv text parsing, LZ-End, that converges to the entropy
and in practice gets very close to LZ77. LZ-End forces sources to finish at the end of a …

被引用次数：119 相关文章所有 11 个版本

[PDF] uchile.cl Full View

Self-indexed grammar-based compression

F Claude, G Navarro - Fundamenta Informaticae, 2011 - content.iospress.com

Self-indexes aim at representing text collections in a compressed format that allows
extracting arbitrary portions and also offers indexed searching on the collection. Current self …

被引用次数：117 相关文章所有 15 个版本

[HTML] sciencedirect.com

[HTML][HTML] Sensitivity of string compressors and repetitiveness measures

T Akagi, M Funakoshi, S Inenaga - Information and Computation, 2023 - Elsevier

The sensitivity of a string compression algorithm C asks how much the output size C (T) for
an input string T can increase when a single character edit operation is performed on T. This …

被引用次数：28 相关文章所有 6 个版本