Pre-trained language models in biomedical domain: A systematic survey

B Wang, Q Xie, J Pei, Z Chen, P Tiwari, Z Li… - ACM Computing …, 2023 - dl.acm.org
Pre-trained language models (PLMs) have been the de facto paradigm for most natural
language processing tasks. This also benefits the biomedical domain: researchers from …

Evolutionary-scale prediction of atomic-level protein structure with a language model

Z Lin, H Akin, R Rao, B Hie, Z Zhu, W Lu, N Smetanin… - Science, 2023 - science.org
Recent advances in machine learning have leveraged evolutionary information in multiple
sequence alignments to predict protein structure. We demonstrate direct inference of full …

MSA transformer

RM Rao, J Liu, R Verkuil, J Meier… - International …, 2021 - proceedings.mlr.press
Unsupervised protein language models trained across millions of diverse sequences learn
structure and function of proteins. Protein language models studied to date have been …

Transformer protein language models are unsupervised structure learners

R Rao, J Meier, T Sercu, S Ovchinnikov, A Rives - Biorxiv, 2020 - biorxiv.org
Unsupervised contact prediction is central to uncovering physical, structural, and functional
constraints for protein structure determination and design. For decades, the predominant …

Generative models for graph-based protein design

J Ingraham, V Garg, R Barzilay… - Advances in neural …, 2019 - proceedings.neurips.cc
Engineered proteins offer the potential to solve many problems in biomedicine, energy, and
materials science, but creating designs that succeed is difficult in practice. A significant …

Self-supervised contrastive learning of protein representations by mutual information maximization

AX Lu, H Zhang, M Ghassemi, A Moses - BioRxiv, 2020 - biorxiv.org
Pretrained embedding representations of biological sequences which capture meaningful
properties can alleviate many problems associated with supervised learning in biology. We …

EpiDope: a deep neural network for linear B-cell epitope prediction

M Collatz, F Mock, E Barth, M Hölzer, K Sachse… - …, 2021 - academic.oup.com
Motivation By binding to specific structures on antigenic proteins, the so-called epitopes, B-
cell antibodies can neutralize pathogens. The identification of B-cell epitopes is of great …

Prior knowledge facilitates low homologous protein secondary structure prediction with DSM distillation

Q Wang, J Wei, Y Zhou, M Lin, R Ren, S Wang… - …, 2022 - academic.oup.com
Motivation Protein secondary structure prediction (PSSP) is one of the fundamental and
challenging problems in the field of computational biology. Accurate PSSP relies on …

Language modelling for biological sequences–curated datasets and baselines

JJ Almagro Armenteros, AR Johansen, O Winther… - BioRxiv, 2020 - biorxiv.org
Motivation Language modelling (LM) on biological sequences is an emergent topic in the
field of bioinformatics. Current research has shown that language modelling of proteins can …

Evolution is all you need: phylogenetic augmentation for contrastive learning

AX Lu, AX Lu, A Moses - arXiv preprint arXiv:2012.13475, 2020 - arxiv.org
Self-supervised representation learning of biological sequence embeddings alleviates
computational resource constraints on downstream tasks while circumventing expensive …