Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

P Notin, M Dias, J Frazer… - International …, 2022 - proceedings.mlr.press
The ability to accurately model the fitness landscape of protein sequences is critical to a
wide range of applications, from quantifying the effects of human variants on disease …

Transformer-based protein generation with regularized latent space optimization

E Castro, A Godavarthi, J Rubinfien… - Nature Machine …, 2022 - nature.com
The development of powerful natural language models has improved the ability to learn
meaningful representations of protein sequences. In addition, advances in high-throughput …

Proteingym: Large-scale benchmarks for protein fitness prediction and design

P Notin, A Kollasch, D Ritter… - Advances in …, 2024 - proceedings.neurips.cc
Predicting the effects of mutations in proteins is critical to many applications, from
understanding genetic disease to designing novel proteins to address our most pressing …

Rita: a study on scaling up generative protein sequence models

D Hesslow, N Zanichelli, P Notin, I Poli… - arXiv preprint arXiv …, 2022 - arxiv.org
In this work we introduce RITA: a suite of autoregressive generative models for protein
sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences …

[HTML][HTML] Proteingym: Large-scale benchmarks for protein design and fitness prediction

P Notin, AW Kollasch, D Ritter, L van Niekerk, S Paul… - bioRxiv, 2023 - ncbi.nlm.nih.gov
Predicting the effects of mutations in proteins is critical to many applications, from
understanding genetic disease to designing novel proteins that can address our most …

Learning protein fitness models from evolutionary and assay-labeled data

C Hsu, H Nisonoff, C Fannjiang, J Listgarten - Nature biotechnology, 2022 - nature.com
Abstract Machine learning-based models of protein fitness typically learn from either
unlabeled, evolutionarily related sequences or variant sequences with experimentally …

Poet: A generative model of protein families as sequences-of-sequences

T Truong Jr, T Bepler - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Generative protein language models are a natural way to design new proteins with desired
functions. However, current models are either difficult to direct to produce a protein from a …

Progen2: exploring the boundaries of protein language models

E Nijkamp, JA Ruffolo, EN Weinstein, N Naik, A Madani - Cell systems, 2023 - cell.com
Attention-based models trained on protein sequences have demonstrated incredible
success at classification and generation tasks relevant for artificial-intelligence-driven …

Using machine learning to predict the effects and consequences of mutations in proteins

DJ Diaz, AV Kulikova, AD Ellington, CO Wilke - Current opinion in structural …, 2023 - Elsevier
Abstract Machine and deep learning approaches can leverage the increasingly available
massive datasets of protein sequences, structures, and mutational effects to predict variants …

[HTML][HTML] Addressing data scarcity in protein fitness landscape analysis: A study on semi-supervised and deep transfer learning techniques

JA Barbero-Aparicio, A Olivares-Gil, JJ Rodríguez… - Information …, 2024 - Elsevier
This paper presents a comprehensive analysis of deep transfer learning methods,
supervised methods, and semi-supervised methods in the context of protein fitness …