Machine learning-guided protein engineering

P Kouba, P Kohout, F Haddadi, A Bushuiev… - ACS …, 2023 - ACS Publications
Recent progress in engineering highly promising biocatalysts has increasingly involved
machine learning methods. These methods leverage existing experimental and simulation …

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

E Nguyen, M Poli, M Faizi, A Thomas… - Advances in neural …, 2024 - proceedings.neurips.cc
Genomic (DNA) sequences encode an enormous amount of information for gene regulation
and protein synthesis. Similar to natural language models, researchers have proposed …

Dnabert-2: Efficient foundation model and benchmark for multi-species genome

Z Zhou, Y Ji, W Li, P Dutta, R Davuluri, H Liu - arXiv preprint arXiv …, 2023 - arxiv.org
Decoding the linguistic intricacies of the genome is a crucial problem in biology, and pre-
trained foundational models such as DNABERT and Nucleotide Transformer have made …

To transformers and beyond: large language models for the genome

ME Consens, C Dufault, M Wainberg, D Forster… - arXiv preprint arXiv …, 2023 - arxiv.org
In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool
for tackling complex computational challenges. This review focuses on the transformative …

Caduceus: Bi-directional equivariant long-range dna sequence modeling

Y Schiff, CH Kao, A Gokaslan, T Dao, A Gu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large-scale sequence modeling has sparked rapid advances that now extend into biology
and genomics. However, modeling genomic sequences introduces challenges such as the …

Simple linear attention language models balance the recall-throughput tradeoff

S Arora, S Eyuboglu, M Zhang, A Timalsina… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent work has shown that attention-based language models excel at recall, the ability to
ground generations in tokens previously seen in context. However, the efficiency of attention …

Bend: Benchmarking dna language models on biologically meaningful tasks

FI Marin, F Teufel, M Horlacher, D Madsen… - The Twelfth …, 2023 - openreview.net
The genome sequence contains the blueprint for governing cellular processes. While the
availability of genomes has vastly increased over the last decades, experimental annotation …

Progress and Opportunities of Foundation Models in Bioinformatics

Q Li, Z Hu, Y Wang, L Li, Y Fan, I King, L Song… - arXiv preprint arXiv …, 2024 - arxiv.org
Bioinformatics has witnessed a paradigm shift with the increasing integration of artificial
intelligence (AI), particularly through the adoption of foundation models (FMs). These AI …

DiscDiff: Latent Diffusion Model for DNA Sequence Generation

Z Li, Y Ni, WAV Beardall, G Xia, A Das, GB Stan… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper introduces a novel framework for DNA sequence generation, comprising two key
components: DiscDiff, a Latent Diffusion Model (LDM) tailored for generating discrete DNA …

BEACON: Benchmark for Comprehensive RNA Tasks and Language Models

Y Ren, Z Chen, L Qiao, H Jing, Y Cai, S Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
RNA plays a pivotal role in translating genetic instructions into functional outcomes,
underscoring its importance in biological processes and disease mechanisms. Despite the …