Techniques for inverted index compression
GE Pibiri, R Venturini - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
The data structure at the core of large-scale search engines is the inverted index, which is
essentially a collection of sorted integer sequences called inverted lists. Because of the …
essentially a collection of sorted integer sequences called inverted lists. Because of the …
MONI: a pangenomic index for finding maximal exact matches
Recently, Gagie et al. proposed a version of the FM-index, called the r-index, that can store
thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to …
thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to …
Fully functional static and dynamic succinct trees
G Navarro, K Sadakane - ACM Transactions on Algorithms (TALG), 2014 - dl.acm.org
We propose new succinct representations of ordinal trees and match various space/time
lower bounds. It is known that any n-node static tree can be represented in 2 n+ o (n) bits so …
lower bounds. It is known that any n-node static tree can be represented in 2 n+ o (n) bits so …
[HTML][HTML] Wavelet trees for all
G Navarro - Journal of Discrete Algorithms, 2014 - Elsevier
The wavelet tree is a versatile data structure that serves a number of purposes, from string
processing to computational geometry. It can be regarded as a device that represents a …
processing to computational geometry. It can be regarded as a device that represents a …
[图书][B] Genome-scale algorithm design
High-throughput sequencing has revolutionised the field of biological sequence analysis. Its
application has enabled researchers to address important biological questions, often for the …
application has enabled researchers to address important biological questions, often for the …
On compressing and indexing repetitive sequences
S Kreft, G Navarro - Theoretical Computer Science, 2013 - Elsevier
We introduce LZ-End, a new member of the Lempel–Ziv family of text compressors, which
achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary …
achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary …
A learned approach to design compressed rank/select data structures
We address the problem of designing, implementing, and experimenting with compressed
data structures that support rank and select queries over a dictionary of integers. We shine a …
data structures that support rank and select queries over a dictionary of integers. We shine a …
High throughput short read alignment via bi-directional BWT
The advancement of sequencing technologies has made it feasible for researchers to
consider many high-throughput biological applications. A core step of these applications is …
consider many high-throughput biological applications. A core step of these applications is …
Self-indexed grammar-based compression
Self-indexes aim at representing text collections in a compressed format that allows
extracting arbitrary portions and also offers indexed searching on the collection. Current self …
extracting arbitrary portions and also offers indexed searching on the collection. Current self …
The wavelet matrix: An efficient wavelet tree for large alphabets
The wavelet tree is a flexible data structure that permits representing sequences S [1, n] of
symbols over an alphabet of size σ, within compressed space and supporting a wide range …
symbols over an alphabet of size σ, within compressed space and supporting a wide range …