Alignment-free sequence analysis and applications

J Ren, X Bai, YY Lu, K Tang, Y Wang… - Annual Review of …, 2018 - annualreviews.org
Genome and metagenome comparisons based on large amounts of next-generation
sequencing (NGS) data pose significant challenges for alignment-based approaches due to …

Ten quick tips for bioinformatics analyses using an Apache Spark distributed computing environment

D Chicco, U Ferraro Petrillo… - PLOS Computational …, 2023 - journals.plos.org
Some scientific studies involve huge amounts of bioinformatics data that cannot be analyzed
on personal computers usually employed by researchers for day-to-day activities but rather …

Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics

U Ferraro Petrillo, M Sorella, G Cattaneo… - BMC …, 2019 - Springer
Background Distributed approaches based on the MapReduce programming paradigm
have started to be proposed in the Bioinformatics domain, due to the large amount of data …

FASTdoop: a versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications

U Ferraro Petrillo, G Roscigno, G Cattaneo… - …, 2017 - academic.oup.com
MapReduce Hadoop bioinformatics applications require the availability of special-purpose
routines to manage the input of sequence files. Unfortunately, the Hadoop framework does …

'Multi-SpaM': a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees

T Dencker, CA Leimeister, M Gerth… - NAR Genomics and …, 2020 - academic.oup.com
Word-based or 'alignment-free'methods for phylogeny inference have become popular in
recent years. These methods are much faster than traditional, alignment-based approaches …

Informational and linguistic analysis of large genomic sequence collections via efficient hadoop cluster algorithms

U Ferraro Petrillo, G Roscigno, G Cattaneo… - …, 2018 - academic.oup.com
Motivation Information theoretic and compositional/linguistic analysis of genomes have a
central role in bioinformatics, even more so since the associated methodologies are …

[PDF][PDF] Bitmapaligner: bit-parallelism string matching with mapreduce and hadoop

M Aksa, J Rashid, MW Nisar, T Mahmood… - CMC-Comput Mater …, 2021 - researchgate.net
Advancements in next-generation sequencer (NGS) platforms have improved NGS
sequence data production and reduced the cost involved, which has resulted in the …

Failure recovery model in big data using the checkpoint approach

S Chorey, N Sahu - Journal of Integrated Science and …, 2023 - pubs.thesciencein.org
Distributed Stream Processing systems are becoming an increasingly crucial aspect of Big
Data processing platforms as customers grow ever more reliant on their capacity to deliver …

Alignment-free genomic analysis via a big data spark platform

U Ferraro Petrillo, F Palini, G Cattaneo… - …, 2021 - academic.oup.com
Motivation Alignment-free distance and similarity functions (AF functions, for short) are a well-
established alternative to pairwise and multiple sequence alignments for many genomic …

Fast Recovery MapReduce (FAR-MR) to accelerate failure recovery in big data applications

Y Zhu, J Samsudin, R Kanagavelu, W Zhang… - The Journal of …, 2020 - Springer
Abstract Existing Hadoop MapReduce fault tolerance strategy causes the computing jobs
suffering from high performance penalty during failure recovery. In this paper, we propose …