Alignment-free sequence analysis and applications
Genome and metagenome comparisons based on large amounts of next-generation
sequencing (NGS) data pose significant challenges for alignment-based approaches due to …
sequencing (NGS) data pose significant challenges for alignment-based approaches due to …
Ten quick tips for bioinformatics analyses using an Apache Spark distributed computing environment
D Chicco, U Ferraro Petrillo… - PLOS Computational …, 2023 - journals.plos.org
Some scientific studies involve huge amounts of bioinformatics data that cannot be analyzed
on personal computers usually employed by researchers for day-to-day activities but rather …
on personal computers usually employed by researchers for day-to-day activities but rather …
Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics
Background Distributed approaches based on the MapReduce programming paradigm
have started to be proposed in the Bioinformatics domain, due to the large amount of data …
have started to be proposed in the Bioinformatics domain, due to the large amount of data …
FASTdoop: a versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications
MapReduce Hadoop bioinformatics applications require the availability of special-purpose
routines to manage the input of sequence files. Unfortunately, the Hadoop framework does …
routines to manage the input of sequence files. Unfortunately, the Hadoop framework does …
'Multi-SpaM': a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees
Word-based or 'alignment-free'methods for phylogeny inference have become popular in
recent years. These methods are much faster than traditional, alignment-based approaches …
recent years. These methods are much faster than traditional, alignment-based approaches …
Informational and linguistic analysis of large genomic sequence collections via efficient hadoop cluster algorithms
Motivation Information theoretic and compositional/linguistic analysis of genomes have a
central role in bioinformatics, even more so since the associated methodologies are …
central role in bioinformatics, even more so since the associated methodologies are …
[PDF][PDF] Bitmapaligner: bit-parallelism string matching with mapreduce and hadoop
Advancements in next-generation sequencer (NGS) platforms have improved NGS
sequence data production and reduced the cost involved, which has resulted in the …
sequence data production and reduced the cost involved, which has resulted in the …
Failure recovery model in big data using the checkpoint approach
Distributed Stream Processing systems are becoming an increasingly crucial aspect of Big
Data processing platforms as customers grow ever more reliant on their capacity to deliver …
Data processing platforms as customers grow ever more reliant on their capacity to deliver …
Alignment-free genomic analysis via a big data spark platform
Motivation Alignment-free distance and similarity functions (AF functions, for short) are a well-
established alternative to pairwise and multiple sequence alignments for many genomic …
established alternative to pairwise and multiple sequence alignments for many genomic …
Fast Recovery MapReduce (FAR-MR) to accelerate failure recovery in big data applications
Abstract Existing Hadoop MapReduce fault tolerance strategy causes the computing jobs
suffering from high performance penalty during failure recovery. In this paper, we propose …
suffering from high performance penalty during failure recovery. In this paper, we propose …