Techniques for inverted index compression

GE Pibiri, R Venturini - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
The data structure at the core of large-scale search engines is the inverted index, which is
essentially a collection of sorted integer sequences called inverted lists. Because of the …

MONI: a pangenomic index for finding maximal exact matches

M Rossi, M Oliva, B Langmead, T Gagie… - Journal of …, 2022 - liebertpub.com
Recently, Gagie et al. proposed a version of the FM-index, called the r-index, that can store
thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to …

Fully functional static and dynamic succinct trees

G Navarro, K Sadakane - ACM Transactions on Algorithms (TALG), 2014 - dl.acm.org
We propose new succinct representations of ordinal trees and match various space/time
lower bounds. It is known that any n-node static tree can be represented in 2 n+ o (n) bits so …

[HTML][HTML] Wavelet trees for all

G Navarro - Journal of Discrete Algorithms, 2014 - Elsevier
The wavelet tree is a versatile data structure that serves a number of purposes, from string
processing to computational geometry. It can be regarded as a device that represents a …

[图书][B] Genome-scale algorithm design

V Mäkinen, D Belazzougui, F Cunial, AI Tomescu - 2015 - books.google.com
High-throughput sequencing has revolutionised the field of biological sequence analysis. Its
application has enabled researchers to address important biological questions, often for the …

On compressing and indexing repetitive sequences

S Kreft, G Navarro - Theoretical Computer Science, 2013 - Elsevier
We introduce LZ-End, a new member of the Lempel–Ziv family of text compressors, which
achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary …

A learned approach to design compressed rank/select data structures

A Boffa, P Ferragina, G Vinciguerra - ACM Transactions on Algorithms …, 2022 - dl.acm.org
We address the problem of designing, implementing, and experimenting with compressed
data structures that support rank and select queries over a dictionary of integers. We shine a …

High throughput short read alignment via bi-directional BWT

TW Lam, R Li, A Tam, S Wong, E Wu… - 2009 IEEE International …, 2009 - ieeexplore.ieee.org
The advancement of sequencing technologies has made it feasible for researchers to
consider many high-throughput biological applications. A core step of these applications is …

Self-indexed grammar-based compression

F Claude, G Navarro - Fundamenta Informaticae, 2011 - content.iospress.com
Self-indexes aim at representing text collections in a compressed format that allows
extracting arbitrary portions and also offers indexed searching on the collection. Current self …

The wavelet matrix: An efficient wavelet tree for large alphabets

F Claude, G Navarro, A Ordónez - Information Systems, 2015 - Elsevier
The wavelet tree is a flexible data structure that permits representing sequences S [1, n] of
symbols over an alphabet of size σ, within compressed space and supporting a wide range …