Pre-trained language models in biomedical domain: A systematic survey
Pre-trained language models (PLMs) have been the de facto paradigm for most natural
language processing tasks. This also benefits the biomedical domain: researchers from …
language processing tasks. This also benefits the biomedical domain: researchers from …
Evolutionary-scale prediction of atomic-level protein structure with a language model
Recent advances in machine learning have leveraged evolutionary information in multiple
sequence alignments to predict protein structure. We demonstrate direct inference of full …
sequence alignments to predict protein structure. We demonstrate direct inference of full …
Transformer protein language models are unsupervised structure learners
Unsupervised contact prediction is central to uncovering physical, structural, and functional
constraints for protein structure determination and design. For decades, the predominant …
constraints for protein structure determination and design. For decades, the predominant …
Generative models for graph-based protein design
Engineered proteins offer the potential to solve many problems in biomedicine, energy, and
materials science, but creating designs that succeed is difficult in practice. A significant …
materials science, but creating designs that succeed is difficult in practice. A significant …
Self-supervised contrastive learning of protein representations by mutual information maximization
Pretrained embedding representations of biological sequences which capture meaningful
properties can alleviate many problems associated with supervised learning in biology. We …
properties can alleviate many problems associated with supervised learning in biology. We …
EpiDope: a deep neural network for linear B-cell epitope prediction
Motivation By binding to specific structures on antigenic proteins, the so-called epitopes, B-
cell antibodies can neutralize pathogens. The identification of B-cell epitopes is of great …
cell antibodies can neutralize pathogens. The identification of B-cell epitopes is of great …
Prior knowledge facilitates low homologous protein secondary structure prediction with DSM distillation
Motivation Protein secondary structure prediction (PSSP) is one of the fundamental and
challenging problems in the field of computational biology. Accurate PSSP relies on …
challenging problems in the field of computational biology. Accurate PSSP relies on …
Language modelling for biological sequences–curated datasets and baselines
Motivation Language modelling (LM) on biological sequences is an emergent topic in the
field of bioinformatics. Current research has shown that language modelling of proteins can …
field of bioinformatics. Current research has shown that language modelling of proteins can …
Evolution is all you need: phylogenetic augmentation for contrastive learning
Self-supervised representation learning of biological sequence embeddings alleviates
computational resource constraints on downstream tasks while circumventing expensive …
computational resource constraints on downstream tasks while circumventing expensive …