Computational solutions for omics data

B Berger, J Peng, M Singh - Nature reviews genetics, 2013 - nature.com
High-throughput experimental technologies are generating increasingly massive and
complex genomic data sets. The sheer enormity and heterogeneity of these data threaten to …

Data compression for sequencing data

S Deorowicz, S Grabowski - Algorithms for Molecular Biology, 2013 - Springer
Post-Sanger sequencing methods produce tons of data, and there is a generalagreement
that the challenge to store and process them must be addressedwith data compression. In …

Compression of FASTQ and SAM format sequencing data

JK Bonfield, MV Mahoney - PloS one, 2013 - journals.plos.org
Storage and transmission of the data produced by modern DNA sequencing instruments has
become a major concern, which prompted the Pistoia Alliance to pose the …

A reference-free algorithm for computational normalization of shotgun sequencing data

CT Brown, A Howe, Q Zhang, AB Pyrkosz… - arXiv preprint arXiv …, 2012 - arxiv.org
Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell
genomes, and metagenomes has enabled investigation of a wide range of organisms and …

A survey on data compression methods for biological sequences

M Hosseini, D Pratas, AJ Pinho - Information, 2016 - mdpi.com
The ever increasing growth of the production of high-throughput sequencing data poses a
serious challenge to the storage, processing and transmission of these data. As frequently …

MFCompress: a compression tool for FASTA and multi-FASTA data

AJ Pinho, D Pratas - Bioinformatics, 2014 - academic.oup.com
Motivation: The data deluge phenomenon is becoming a serious problem in most genomic
centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data …

High-throughput DNA sequence data compression

Z Zhu, Y Zhang, Z Ji, S He, X Yang - Briefings in bioinformatics, 2015 - academic.oup.com
The exponential growth of high-throughput DNA sequence data has posed great challenges
to genomic data storage, retrieval and transmission. Compression is a critical tool to address …

Efficient DNA sequence compression with neural networks

M Silva, D Pratas, AJ Pinho - GigaScience, 2020 - academic.oup.com
Background The increasing production of genomic data has led to an intensified need for
models that can cope efficiently with the lossless compression of DNA sequences. Important …

FRESCO: Referential compression of highly similar sequences

S Wandelt, U Leser - IEEE/ACM Transactions on Computational …, 2013 - ieeexplore.ieee.org
In many applications, sets of similar texts or sequences are of high importance. Prominent
examples are revision histories of documents or genomic sequences. Modern high …

LFQC: a lossless compression algorithm for FASTQ files

M Nicolae, S Pathak, S Rajasekaran - Bioinformatics, 2015 - academic.oup.com
Abstract Motivation: Next Generation Sequencing (NGS) technologies have revolutionized
genomic research by reducing the cost of whole genome sequencing. One of the biggest …