[HTML][HTML] The language of proteins: NLP, machine learning & protein sequences

D Ofer, N Brandes, M Linial - Computational and Structural Biotechnology …, 2021 - Elsevier
Natural language processing (NLP) is a field of computer science concerned with automated
text and language analysis. In recent years, following a series of breakthroughs in deep and …

Pre-trained language models in biomedical domain: A systematic survey

B Wang, Q Xie, J Pei, Z Chen, P Tiwari, Z Li… - ACM Computing …, 2023 - dl.acm.org
Pre-trained language models (PLMs) have been the de facto paradigm for most natural
language processing tasks. This also benefits the biomedical domain: researchers from …

Evolutionary-scale prediction of atomic-level protein structure with a language model

Z Lin, H Akin, R Rao, B Hie, Z Zhu, W Lu, N Smetanin… - Science, 2023 - science.org
Recent advances in machine learning have leveraged evolutionary information in multiple
sequence alignments to predict protein structure. We demonstrate direct inference of full …

Learning inverse folding from millions of predicted structures

C Hsu, R Verkuil, J Liu, Z Lin, B Hie… - International …, 2022 - proceedings.mlr.press
We consider the problem of predicting a protein sequence from its backbone atom
coordinates. Machine learning approaches to this problem to date have been limited by the …

[HTML][HTML] Highly accurate protein structure prediction with AlphaFold

J Jumper, R Evans, A Pritzel, T Green, M Figurnov… - nature, 2021 - nature.com
Proteins are essential to life, and understanding their structure can facilitate a mechanistic
understanding of their function. Through an enormous experimental effort 1, 2, 3, 4, the …

Single-sequence protein structure prediction using a language model and deep learning

R Chowdhury, N Bouatta, S Biswas, C Floristean… - Nature …, 2022 - nature.com
AlphaFold2 and related computational systems predict protein structure using deep learning
and co-evolutionary relationships encoded in multiple sequence alignments (MSAs) …

ProteinBERT: a universal deep-learning model of protein sequence and function

N Brandes, D Ofer, Y Peleg, N Rappoport… - …, 2022 - academic.oup.com
Self-supervised deep language modeling has shown unprecedented success across natural
language tasks, and has recently been repurposed to biological sequences. However …

Protein remote homology detection and structural alignment using deep learning

T Hamamsy, JT Morton, R Blackwell, D Berenberg… - Nature …, 2024 - nature.com
Exploiting sequence–structure–function relationships in biotechnology requires improved
methods for aligning proteins that have low sequence similarity to previously annotated …

Prottrans: Toward understanding the language of life through self-supervised learning

A Elnaggar, M Heinzinger, C Dallago… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Computational biology and bioinformatics provide vast data gold-mines from protein
sequences, ideal for Language Models (LMs) taken from Natural Language Processing …

MSA transformer

RM Rao, J Liu, R Verkuil, J Meier… - International …, 2021 - proceedings.mlr.press
Unsupervised protein language models trained across millions of diverse sequences learn
structure and function of proteins. Protein language models studied to date have been …