作者
Jintao Meng, Shengzhong Feng, Yanjie Wei
发表日期
2012
期刊
第五届全国生物信息学与系统生物学学术大会论文集
简介
正 Background: Current sequencing technology (Illumina Solexa, Applied Biosystems SoLiD, and Helicos Biosciences Heliscope etc.) allows one to read millions of 35 to 100 nucleotide sequences per hour. Due to experimental errors, gaps, and genomic repeats, a much higher coverage depth of 50-fold to 300-fold is needed for accurate assembly. These factors contribute to a 300-fold to 1000-fold increase in the number of reads, resulting in billions of reads to be processed, which significantly complicates the genome assembly problem. Methods: This paper first demonstrates a multi-step bi-directed graph for the problem of genome assembly. Genome can be recovered by merging semi-extended edges to fullextended edges or contigs. Then a small world asynchronous parallel (SWAP) model is proposed to realize edge merging over a distributed one-step bi-directed graph. SWAP model applies the Lock-Computation-Unlock scheme to each vertex's small world. Later, we implement an assembler named as Para-Assembler using the SWAP model. Given the number of processes p, the complexity of this problem is reduced to 0 (n/p) parallel compute time, 0 (n/p) communication round, and O (glog (g)/p) communication volume, here g is the length of genomes, and n is the number of nucleotide in all input reads. Results: Simulation results shows that Para-Assembler has a factor of 20 times speedup when the number of processors scales from 10 to 640. Conclusions: The proposed SWAP introduced local synchronization and global asynchronization mechanism to maximize the parallelism in the graph algorithm. Based on SWAP model, we …
学术搜索中的文章
J Meng, S Feng, Y Wei - 第五届全国生物信息学与系统生物学学术大会论文集, 2012