Bioinformatics applications on apache spark

R Guo, Y Zhao, Q Zou, X Fang, S Peng - GigaScience, 2018 - academic.oup.com
With the rapid development of next-generation sequencing technology, ever-increasing
quantities of genomic data pose a tremendous challenge to data processing. Therefore …

Cloud computing enabled big multi-omics data analytics

S Koppad, GV Gkoutos… - … and biology insights, 2021 - journals.sagepub.com
High-throughput experiments enable researchers to explore complex multifactorial diseases
through large-scale analysis of omics data. Challenges for such high-dimensional data sets …

SparkBWA: speeding up the alignment of high-throughput DNA sequencing data

JM Abuín, JC Pichel, TF Pena, J Amigo - PloS one, 2016 - journals.plos.org
Next-generation sequencing (NGS) technologies have led to a huge amount of genomic
data that need to be analyzed and interpreted. This fact has a huge impact on the DNA …

Variant calling parallelization on processor-in-memory architecture

D Lavenier, R Cimadomo… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
This paper introduces a new combination of software and hardware PIM (Process-in-
Memory) architecture to accelerate the variant calling genomic process. PIM translates into …

[图书][B] Data science in healthcare: Benefits, challenges and opportunities

Z Abedjan, N Boujemaa, S Campbell, P Casla… - 2019 - Springer
The advent of digital medical data has brought an exponential increase in information
available for each patient, allowing for novel knowledge generation methods to emerge …

[HTML][HTML] Scalability and validation of big data bioinformatics software

A Yang, M Troup, JWK Ho - Computational and structural biotechnology …, 2017 - Elsevier
This review examines two important aspects that are central to modern big data
bioinformatics analysis–software scalability and validity. We argue that not only are the …

Recommendations for performance optimizations when using GATK3. 8 and GATK4

JR Heldenbrand, S Baheti, MA Bockol, TM Drucker… - BMC …, 2019 - Springer
Abstract Background Use of the Genome Analysis Toolkit (GATK) continues to be the
standard practice in genomic variant calling in both research and the clinic. Recently the …

Sparkga: A spark framework for cost effective, fast and accurate dna analysis at scale

H Mushtaq, F Liu, C Costa, G Liu, P Hofstee… - Proceedings of the 8th …, 2017 - dl.acm.org
In recent years, the cost of NGS (Next Generation Sequencing) technology has dramatically
reduced, making it a viable method for diagnosing genetic diseases. The large amount of …

HISAT2 parallelization method based on spark cluster

J Guo, J Gao, Z Liu - Journal of Physics: Conference Series, 2022 - iopscience.iop.org
Sequence alignment is one of the most important components in the Bioinformatics research
field. It is of great significance to discover the functional structure and genetic information of …

BiobankCloud: a platform for the secure storage, sharing, and processing of large biomedical data sets

A Bessani, J Brandt, M Bux, V Cogo, L Dimitrova… - … Data Management and …, 2016 - Springer
Biobanks store and catalog human biological material that is increasingly being digitized
using next-generation sequencing (NGS). There is, however, a computational bottleneck, as …