作者
Tobias H Olsen, Iain H Moal, Charlotte M Deane
发表日期
2022/1/1
期刊
Bioinformatics Advances
卷号
2
期号
1
页码范围
vbac046
出版商
Oxford University Press
简介
Motivation
General protein language models have been shown to summarize the semantics of protein sequences into representations that are useful for state-of-the-art predictive methods. However, for antibody specific problems, such as restoring residues lost due to sequencing errors, a model trained solely on antibodies may be more powerful. Antibodies are one of the few protein types where the volume of sequence data needed for such language models is available, e.g. in the Observed Antibody Space (OAS) database.
Results
Here, we introduce AbLang, a language model trained on the antibody sequences in the OAS database. We demonstrate the power of AbLang by using it to restore missing residues in antibody sequence data, a key issue with B-cell receptor repertoire sequencing, e.g. over 40% of OAS sequences are missing the first 15 amino acids. AbLang …
引用总数
学术搜索中的文章