Data compression for sequencing data

S Deorowicz, S Grabowski - Algorithms for Molecular Biology, 2013 - Springer
Post-Sanger sequencing methods produce tons of data, and there is a generalagreement
that the challenge to store and process them must be addressedwith data compression. In …

High-throughput DNA sequence data compression

Z Zhu, Y Zhang, Z Ji, S He, X Yang - Briefings in bioinformatics, 2015 - academic.oup.com
The exponential growth of high-throughput DNA sequence data has posed great challenges
to genomic data storage, retrieval and transmission. Compression is a critical tool to address …

Robust relative compression of genomes with random access

S Deorowicz, S Grabowski - Bioinformatics, 2011 - academic.oup.com
Motivation: Storing, transferring and maintaining genomic databases becomes a major
challenge because of the rapid technology progress in DNA sequencing and …

A survey on data compression methods for biological sequences

M Hosseini, D Pratas, AJ Pinho - Information, 2016 - mdpi.com
The ever increasing growth of the production of high-throughput sequencing data poses a
serious challenge to the storage, processing and transmission of these data. As frequently …

Memory-efficient assembly using Flye

B Freire, S Ladra, JR Paramá - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
In the past decade, next-generation sequencing (NGS) enabled the generation of genomic
data in a cost-effective, high-throughput manner. The most recent third-generation …

GReEn: a tool for efficient compression of genome resequencing data

AJ Pinho, D Pratas, SP Garcia - Nucleic acids research, 2012 - academic.oup.com
Research in the genomic sciences is confronted with the volume of sequencing and
resequencing data increasing at a higher pace than that of data storage and communication …

Computing MEMs and Relatives on Repetitive Text Collections

G Navarro - arXiv preprint arXiv:2210.09914, 2022 - arxiv.org
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given
pattern $ P [1.. m] $ on a large repetitive text collection $ T [1.. n] $, which is represented as a …

Efficient DNA sequence compression with neural networks

M Silva, D Pratas, AJ Pinho - GigaScience, 2020 - academic.oup.com
Background The increasing production of genomic data has led to an intensified need for
models that can cope efficiently with the lossless compression of DNA sequences. Important …

Iterative dictionary construction for compression of large DNA data sets

S Kuruppu, B Beresford-Smith… - … /ACM transactions on …, 2011 - ieeexplore.ieee.org
Genomic repositories increasingly include individual as well as reference sequences, which
tend to share long identical and near-identical strings of nucleotides. However, the …

FRESCO: Referential compression of highly similar sequences

S Wandelt, U Leser - IEEE/ACM Transactions on Computational …, 2013 - ieeexplore.ieee.org
In many applications, sets of similar texts or sequences are of high importance. Prominent
examples are revision histories of documents or genomic sequences. Modern high …