Lab-bench: Measuring capabilities of language models for biology research

M Schilling-Wilhelmi, M Ríos-García, S Shabih… - arXiv preprint arXiv …, 2024 - arxiv.org

The vast majority of materials science knowledge exists in unstructured natural language,
yet structured data is crucial for innovative and systematic materials design. Traditionally, the …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Language agents achieve superhuman synthesis of scientific knowledge

MD Skarlinski, S Cox, JM Laurent, JD Braza… - arXiv preprint arXiv …, 2024 - arxiv.org

Language models are known to hallucinate incorrect information, and it is unclear if they are
sufficiently accurate and reliable for use in scientific research. We developed a rigorous …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

Probing the limitations of multimodal language models for chemistry and materials research

N Alampara, M Schilling-Wilhelmi… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in artificial intelligence have sparked interest in scientific assistants
that could support researchers across the full spectrum of scientific workflows, from literature …

被引用次数：2 相关文章所有 2 个版本

[PDF] biorxiv.org

The virtual lab: Ai agents design new sars-cov-2 nanobodies with experimental validation

K Swanson, W Wu, NL Bulaong, JE Pak, J Zou - bioRxiv, 2024 - biorxiv.org

Science frequently benefits from teams of interdisciplinary researchers. However, most
scientists don't have access to experts from multiple fields. Fortunately, large language …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

HELM: Hierarchical Encoding for mRNA Language Modeling

M Yazdani-Jahromi, M Prakash, T Mansi… - arXiv preprint arXiv …, 2024 - arxiv.org

Messenger RNA (mRNA) plays a crucial role in protein synthesis, with its codon structure
directly impacting biological properties. While Language Models (LMs) have shown promise …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Aviary: training language agents on challenging scientific tasks

S Narayanan, JD Braza, RR Griffiths… - arXiv preprint arXiv …, 2024 - arxiv.org

Solving complex real-world tasks requires cycles of actions and observations. This is
particularly true in science, where tasks require many cycles of analysis, tool use, and …

Unveiling the power of language models in chemical research question answering

X Chen, T Wang, T Guo, K Guo, J Zhou, H Li… - Communications …, 2025 - nature.com

While the abilities of language models are thoroughly evaluated in areas like general
domains and biomedicine, academic chemistry remains less explored. Chemical QA tools …

SwiftDossier: Tailored Automatic Dossier for Drug Discovery with LLMs and Agents

G Fossi, Y Boulaimen, L Outemzabet, N Jeanray… - arXiv preprint arXiv …, 2024 - arxiv.org

The advancement of artificial intelligence algorithms has expanded their application to
several fields such as the biomedical domain. Artificial intelligence systems, including Large …

被引用次数：1 相关文章所有 2 个版本

[PDF] biorxiv.org

BioLP-bench: Measuring understanding of biological lab protocols by large language models

I Ivanov - bioRxiv, 2024 - biorxiv.org

Abstract Language models rapidly become more capable in many domains, including
biology. Both AI developers and policy makers are in need of benchmarks that evaluate their …

[PDF] openreview.net

The HALoGen Benchmark: Fantastic LLM Hallucinations and Where To Find Them

A Ravichander, S Ghela, D Wadden, Y Choi - openreview.net

Despite their impressive ability to generate high-quality and fluent text, generative large
language models (LLMs) also produce hallucinations: statements that are misaligned with …