Compressed full-text indexes
Full-text indexes provide fast substring search over large text collections. A serious problem
of these indexes has traditionally been their space consumption. A recent trend is to develop …
of these indexes has traditionally been their space consumption. A recent trend is to develop …
Survey and taxonomy of lossless graph compression and space-efficient graph representations
Various graphs such as web or social networks may contain up to trillions of edges.
Compressing such datasets can accelerate graph processing by reducing the amount of I/O …
Compressing such datasets can accelerate graph processing by reducing the amount of I/O …
The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds
P Ferragina, G Vinciguerra - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
We present the first learned index that supports predecessor, range queries and updates
within provably efficient time and space bounds in the worst case. In the (static) context of …
within provably efficient time and space bounds in the worst case. In the (static) context of …
From theory to practice: Plug and play with succinct data structures
Engineering efficient implementations of compact and succinct structures is time-consuming
and challenging, since there is no standard library of easy-to-use, highly optimized, and …
and challenging, since there is no standard library of easy-to-use, highly optimized, and …
Xenome—a tool for classifying reads from xenograft samples
Motivation: Shotgun sequence read data derived from xenograft material contains a mixture
of reads arising from the host and reads arising from the graft. Classifying the read mixture to …
of reads arising from the host and reads arising from the graft. Classifying the read mixture to …
[HTML][HTML] Wavelet trees for all
G Navarro - Journal of Discrete Algorithms, 2014 - Elsevier
The wavelet tree is a versatile data structure that serves a number of purposes, from string
processing to computational geometry. It can be regarded as a device that represents a …
processing to computational geometry. It can be regarded as a device that represents a …
Succinct de Bruijn graphs
A Bowe, T Onodera, K Sadakane, T Shibuya - International workshop on …, 2012 - Springer
We propose a new succinct de Bruijn graph representation. If the de Bruijn graph of k-mers
in a DNA sequence of length N has m edges, it can be represented in 4 m+ o (m) bits. This is …
in a DNA sequence of length N has m edges, it can be represented in 4 m+ o (m) bits. This is …
The theory and practice of genome sequence assembly
JT Simpson, M Pop - Annual review of genomics and human …, 2015 - annualreviews.org
The current genomic revolution was made possible by joint advances in genome
sequencing technologies and computational approaches for analyzing sequence data. The …
sequencing technologies and computational approaches for analyzing sequence data. The …
Succinct colored de Bruijn graphs
Abstract Motivation In 2012, Iqbal et al. introduced the colored de Bruijn graph, a variant of
the classic de Bruijn graph, which is aimed at 'detecting and genotyping simple and complex …
the classic de Bruijn graph, which is aimed at 'detecting and genotyping simple and complex …
On compressing and indexing repetitive sequences
S Kreft, G Navarro - Theoretical Computer Science, 2013 - Elsevier
We introduce LZ-End, a new member of the Lempel–Ziv family of text compressors, which
achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary …
achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary …