作者
Advait Balaji, Bryce Kille, Anthony D Kappell, Gene D Godbold, Madeline Diep, RA Leo Elworth, Zhiqin Qian, Dreycey Albin, Daniel J Nasko, Nidhi Shah, Mihai Pop, Santiago Segarra, Krista L Ternus, Todd J Treangen
发表日期
2022/6/20
期刊
Genome biology
卷号
23
期号
1
页码范围
133
出版商
BioMed Central
简介
The COVID-19 pandemic has emphasized the importance of accurate detection of known and emerging pathogens. However, robust characterization of pathogenic sequences remains an open challenge. To address this need we developed SeqScreen, which accurately characterizes short nucleotide sequences using taxonomic and functional labels and a customized set of curated Functions of Sequences of Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model can label protein-coding sequences with FunSoCs with high recall and precision. SeqScreen is a step towards a novel paradigm of functionally informed synthetic DNA screening and pathogen characterization, available for download at www.gitlab.com/treangenlab/seqscreen.
引用总数
学术搜索中的文章