- 学术资源搜索

Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org

Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

被引用次数：118 相关文章所有 7 个版本

[PDF] springer.com

Data compression for sequencing data

S Deorowicz, S Grabowski - Algorithms for Molecular Biology, 2013 - Springer

Post-Sanger sequencing methods produce tons of data, and there is a generalagreement
that the challenge to store and process them must be addressedwith data compression. In …

被引用次数：124 相关文章所有 13 个版本

[PDF] oup.com

Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data

I Birol, A Raymond, SD Jackman, S Pleasance… - …, 2013 - academic.oup.com

White spruce (Picea glauca) is a dominant conifer of the boreal forests of North America, and
providing genomics resources for this commercially valuable tree will help improve forest …

被引用次数：455 相关文章所有 14 个版本

[PDF] oup.com

BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters

J Chu, S Sadeghi, A Raymond, SD Jackman… - …, 2014 - academic.oup.com

Large datasets can be screened for sequences from a specific organism, quickly and with
low memory requirements, by a data structure that supports time-and memory-efficient set …

被引用次数：126 相关文章所有 11 个版本

[PDF] acm.org

A learned approach to design compressed rank/select data structures

A Boffa, P Ferragina, G Vinciguerra - ACM Transactions on Algorithms …, 2022 - dl.acm.org

We address the problem of designing, implementing, and experimenting with compressed
data structures that support rank and select queries over a dictionary of integers. We shine a …

被引用次数：27 相关文章所有 4 个版本

[PDF] springer.com

Prefix-free parsing for building big BWTs

C Boucher, T Gagie, A Kuhnle, B Langmead… - Algorithms for Molecular …, 2019 - Springer

High-throughput sequencing technologies have led to explosive growth of genomic
databases; one of which will soon reach hundreds of terabytes. For many applications we …

被引用次数：84 相关文章所有 32 个版本

Practical linear-time O(1)-workspace suffix sorting for constant alphabets

G Nong - ACM Transactions on Information Systems (TOIS), 2013 - dl.acm.org

This article presents an O (n)-time algorithm called SACA-K for sorting the suffixes of an
input string T [0, n-1] over an alphabet A [0, K-1]. The problem of sorting the suffixes of T is …

被引用次数：93 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of BWT variants for string collections

D Cenzato, Z Lipták - arXiv preprint arXiv:2202.13235, 2022 - arxiv.org

In recent years, the focus of bioinformatics research has moved from individual sequences to
collections of sequences. Given the fundamental role of the Burrows-Wheeler Transform …

被引用次数：17 相关文章所有 11 个版本

[PDF] arxiv.org

Lightweight data indexing and compression in external memory

P Ferragina, T Gagie, G Manzini - Algorithmica, 2012 - Springer

In this paper we describe algorithms for computing the Burrows-Wheeler Transform (bwt)
and for building (compressed) indexes in external memory. The innovative feature of our …

被引用次数：107 相关文章所有 17 个版本

[PDF] archive.org

Sketching and sublinear data structures in genomics

G Marçais, B Solomon, R Patro… - Annual Review of …, 2019 - annualreviews.org

Large-scale genomics demands computational methods that scale sublinearly with the
growth of data. We review several data structures and sketching techniques that have been …

被引用次数：47 相关文章所有 4 个版本