Scaling laws for generative mixed-modal language models

A Aghajanyan, L Yu, A Conneau… - International …, 2023 - proceedings.mlr.press
Generative language models define distributions over sequences of tokens that can
represent essentially any combination of data modalities (eg, any permutation of image …

Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks

Y Wang, C Pang, Y Wang, J Jin, J Zhang… - Nature …, 2023 - nature.com
Automating retrosynthesis with artificial intelligence expedites organic chemistry research in
digital laboratories. However, most existing deep-learning approaches are hard to explain …

Enhancing activity prediction models in drug discovery with the ability to understand human language

P Seidl, A Vall, S Hochreiter… - … on Machine Learning, 2023 - proceedings.mlr.press
Activity and property prediction models are the central workhorses in drug discovery and
materials sciences, but currently, they have to be trained or fine-tuned for new tasks. Without …

Scientific large language models: A survey on biological & chemical domains

Q Zhang, K Ding, T Lyv, X Wang, Q Yin… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have emerged as a transformative power in enhancing
natural language comprehension, representing a significant stride toward artificial general …

Bayesian optimization of catalysts with in-context learning

MC Ramos, SS Michtavy, MD Porosoff… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) are able to do accurate classification with zero or only a few
examples (in-context learning). We show a prompting system that enables regression with …

Coati: Multimodal contrastive pretraining for representing and traversing chemical space

B Kaufman, EC Williams, C Underkoffler… - Journal of Chemical …, 2024 - ACS Publications
Creating a successful small molecule drug is a challenging multiparameter optimization
problem in an effectively infinite space of possible molecules. Generative models have …

Regression with large language models for materials and molecular property prediction

R Jacobs, MP Polak, LE Schultz, H Mahdavi… - arXiv preprint arXiv …, 2024 - arxiv.org
We demonstrate the ability of large language models (LLMs) to perform material and
molecular property regression tasks, a significant deviation from the conventional LLM use …

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

Y Zhang, X Chen, B Jin, S Wang, S Ji, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
In many scientific fields, large language models (LLMs) have revolutionized the way with
which text and other modalities of data (eg, molecules and proteins) are dealt, achieving …

Lost in Translation: Chemical Language Models and the Misunderstanding of Molecule Structures

V Ganeeva, A Sakhovskiy, K Khrabrov… - Findings of the …, 2024 - aclanthology.org
The recent integration of chemistry with natural language processing (NLP) has advanced
drug discovery. Molecule representation in language models (LMs) is crucial in enhancing …

Mollm: A unified language model to integrate biomedical text with 2d and 3d molecular representations

X Tang, A Tran, J Tan, MB Gerstein - bioRxiv, 2023 - biorxiv.org
Motivation The present paradigm of deep learning models for molecular representation
relies mostly on 1D or 2D formats, neglecting significant 3D structural information that offers …