Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval
The ability to accurately model the fitness landscape of protein sequences is critical to a
wide range of applications, from quantifying the effects of human variants on disease …
wide range of applications, from quantifying the effects of human variants on disease …
Transformer-based protein generation with regularized latent space optimization
The development of powerful natural language models has improved the ability to learn
meaningful representations of protein sequences. In addition, advances in high-throughput …
meaningful representations of protein sequences. In addition, advances in high-throughput …
Proteingym: Large-scale benchmarks for protein fitness prediction and design
Predicting the effects of mutations in proteins is critical to many applications, from
understanding genetic disease to designing novel proteins to address our most pressing …
understanding genetic disease to designing novel proteins to address our most pressing …
Rita: a study on scaling up generative protein sequence models
In this work we introduce RITA: a suite of autoregressive generative models for protein
sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences …
sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences …
[HTML][HTML] Proteingym: Large-scale benchmarks for protein design and fitness prediction
Predicting the effects of mutations in proteins is critical to many applications, from
understanding genetic disease to designing novel proteins that can address our most …
understanding genetic disease to designing novel proteins that can address our most …
Learning protein fitness models from evolutionary and assay-labeled data
Abstract Machine learning-based models of protein fitness typically learn from either
unlabeled, evolutionarily related sequences or variant sequences with experimentally …
unlabeled, evolutionarily related sequences or variant sequences with experimentally …
Poet: A generative model of protein families as sequences-of-sequences
T Truong Jr, T Bepler - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Generative protein language models are a natural way to design new proteins with desired
functions. However, current models are either difficult to direct to produce a protein from a …
functions. However, current models are either difficult to direct to produce a protein from a …
Progen2: exploring the boundaries of protein language models
Attention-based models trained on protein sequences have demonstrated incredible
success at classification and generation tasks relevant for artificial-intelligence-driven …
success at classification and generation tasks relevant for artificial-intelligence-driven …
Using machine learning to predict the effects and consequences of mutations in proteins
Abstract Machine and deep learning approaches can leverage the increasingly available
massive datasets of protein sequences, structures, and mutational effects to predict variants …
massive datasets of protein sequences, structures, and mutational effects to predict variants …
[HTML][HTML] Addressing data scarcity in protein fitness landscape analysis: A study on semi-supervised and deep transfer learning techniques
This paper presents a comprehensive analysis of deep transfer learning methods,
supervised methods, and semi-supervised methods in the context of protein fitness …
supervised methods, and semi-supervised methods in the context of protein fitness …