Bioinformatics applications on apache spark
With the rapid development of next-generation sequencing technology, ever-increasing
quantities of genomic data pose a tremendous challenge to data processing. Therefore …
quantities of genomic data pose a tremendous challenge to data processing. Therefore …
Cloud computing enabled big multi-omics data analytics
S Koppad, GV Gkoutos… - … and biology insights, 2021 - journals.sagepub.com
High-throughput experiments enable researchers to explore complex multifactorial diseases
through large-scale analysis of omics data. Challenges for such high-dimensional data sets …
through large-scale analysis of omics data. Challenges for such high-dimensional data sets …
SparkBWA: speeding up the alignment of high-throughput DNA sequencing data
Next-generation sequencing (NGS) technologies have led to a huge amount of genomic
data that need to be analyzed and interpreted. This fact has a huge impact on the DNA …
data that need to be analyzed and interpreted. This fact has a huge impact on the DNA …
Variant calling parallelization on processor-in-memory architecture
D Lavenier, R Cimadomo… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
This paper introduces a new combination of software and hardware PIM (Process-in-
Memory) architecture to accelerate the variant calling genomic process. PIM translates into …
Memory) architecture to accelerate the variant calling genomic process. PIM translates into …
[图书][B] Data science in healthcare: Benefits, challenges and opportunities
Z Abedjan, N Boujemaa, S Campbell, P Casla… - 2019 - Springer
The advent of digital medical data has brought an exponential increase in information
available for each patient, allowing for novel knowledge generation methods to emerge …
available for each patient, allowing for novel knowledge generation methods to emerge …
[HTML][HTML] Scalability and validation of big data bioinformatics software
This review examines two important aspects that are central to modern big data
bioinformatics analysis–software scalability and validity. We argue that not only are the …
bioinformatics analysis–software scalability and validity. We argue that not only are the …
Recommendations for performance optimizations when using GATK3. 8 and GATK4
JR Heldenbrand, S Baheti, MA Bockol, TM Drucker… - BMC …, 2019 - Springer
Abstract Background Use of the Genome Analysis Toolkit (GATK) continues to be the
standard practice in genomic variant calling in both research and the clinic. Recently the …
standard practice in genomic variant calling in both research and the clinic. Recently the …
Sparkga: A spark framework for cost effective, fast and accurate dna analysis at scale
In recent years, the cost of NGS (Next Generation Sequencing) technology has dramatically
reduced, making it a viable method for diagnosing genetic diseases. The large amount of …
reduced, making it a viable method for diagnosing genetic diseases. The large amount of …
HISAT2 parallelization method based on spark cluster
J Guo, J Gao, Z Liu - Journal of Physics: Conference Series, 2022 - iopscience.iop.org
Sequence alignment is one of the most important components in the Bioinformatics research
field. It is of great significance to discover the functional structure and genetic information of …
field. It is of great significance to discover the functional structure and genetic information of …
BiobankCloud: a platform for the secure storage, sharing, and processing of large biomedical data sets
Biobanks store and catalog human biological material that is increasingly being digitized
using next-generation sequencing (NGS). There is, however, a computational bottleneck, as …
using next-generation sequencing (NGS). There is, however, a computational bottleneck, as …