Indexing highly repetitive string collections, part II: Compressed indexes
G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …
represent them within their compressed space while at the same time offering indexed …
Data compression for sequencing data
S Deorowicz, S Grabowski - Algorithms for Molecular Biology, 2013 - Springer
Post-Sanger sequencing methods produce tons of data, and there is a generalagreement
that the challenge to store and process them must be addressedwith data compression. In …
that the challenge to store and process them must be addressedwith data compression. In …
Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data
White spruce (Picea glauca) is a dominant conifer of the boreal forests of North America, and
providing genomics resources for this commercially valuable tree will help improve forest …
providing genomics resources for this commercially valuable tree will help improve forest …
BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters
Large datasets can be screened for sequences from a specific organism, quickly and with
low memory requirements, by a data structure that supports time-and memory-efficient set …
low memory requirements, by a data structure that supports time-and memory-efficient set …
A learned approach to design compressed rank/select data structures
We address the problem of designing, implementing, and experimenting with compressed
data structures that support rank and select queries over a dictionary of integers. We shine a …
data structures that support rank and select queries over a dictionary of integers. We shine a …
Prefix-free parsing for building big BWTs
High-throughput sequencing technologies have led to explosive growth of genomic
databases; one of which will soon reach hundreds of terabytes. For many applications we …
databases; one of which will soon reach hundreds of terabytes. For many applications we …
Practical linear-time O(1)-workspace suffix sorting for constant alphabets
G Nong - ACM Transactions on Information Systems (TOIS), 2013 - dl.acm.org
This article presents an O (n)-time algorithm called SACA-K for sorting the suffixes of an
input string T [0, n-1] over an alphabet A [0, K-1]. The problem of sorting the suffixes of T is …
input string T [0, n-1] over an alphabet A [0, K-1]. The problem of sorting the suffixes of T is …
A survey of BWT variants for string collections
D Cenzato, Z Lipták - arXiv preprint arXiv:2202.13235, 2022 - arxiv.org
In recent years, the focus of bioinformatics research has moved from individual sequences to
collections of sequences. Given the fundamental role of the Burrows-Wheeler Transform …
collections of sequences. Given the fundamental role of the Burrows-Wheeler Transform …
Lightweight data indexing and compression in external memory
In this paper we describe algorithms for computing the Burrows-Wheeler Transform (bwt)
and for building (compressed) indexes in external memory. The innovative feature of our …
and for building (compressed) indexes in external memory. The innovative feature of our …
Sketching and sublinear data structures in genomics
Large-scale genomics demands computational methods that scale sublinearly with the
growth of data. We review several data structures and sketching techniques that have been …
growth of data. We review several data structures and sketching techniques that have been …