查看文章

作者

Camille Marchet, Christina Boucher, Simon J Puglisi, Paul Medvedev, Mikaël Salson, Rayan Chikhi

发表日期

2020/12/16

来源

Genome Research

出版商

Cold Spring Harbor Lab

简介

High-throughput sequencing data sets are usually deposited in public repositories (e.g., the European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow one to perform online sequence searches, yet, such a feature would be highly useful to investigators. Toward this goal, in the last few years several computational approaches have been introduced to index and query large collections of data sets. Here, we propose an accessible survey of these approaches, which are generally based on representing data sets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations.

引用总数

被引用次数：99

202020212022202320245 19 19 35 19

学术搜索中的文章

Data structures based on k-mers for querying large collections of sequencing data sets

C Marchet, C Boucher, SJ Puglisi, P Medvedev… - Genome research, 2021

被引用次数：99 相关文章所有 18 个版本