Has multimodal learning delivered universal intelligence in healthcare? A comprehensive survey

Q Lin, Y Zhu, X Mei, L Huang, J Ma, K He, Z Peng… - Information …, 2024 - Elsevier
The rapid development of artificial intelligence has constantly reshaped the field of
intelligent healthcare and medicine. As a vital technology, multimodal learning has …

Maira-2: Grounded radiology report generation

S Bannur, K Bouzid, DC Castro, A Schwaighofer… - arXiv preprint arXiv …, 2024 - arxiv.org
Radiology reporting is a complex task requiring detailed medical image understanding and
precise language generation, for which generative multimodal models offer a promising …

Maira-1: A specialised large multimodal model for radiology report generation

SL Hyland, S Bannur, K Bouzid, DC Castro… - arXiv preprint arXiv …, 2023 - arxiv.org
We present a radiology-specific multimodal model for the task for generating radiological
reports from chest X-rays (CXRs). Our work builds on the idea that large language model (s) …

Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?

C Liu, Z Wan, H Wang, Y Chen, T Qaiser, C Jin… - arXiv preprint arXiv …, 2024 - arxiv.org
Medical Vision-Language Pre-training (MedVLP) has made significant progress in enabling
zero-shot tasks for medical image understanding. However, training MedVLP models …

Medimageinsight: An open-source embedding model for general domain medical imaging

NCF Codella, Y Jin, S Jain, Y Gu, HH Lee… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we present MedImageInsight, an open-source medical imaging embedding
model. MedImageInsight is trained on medical images with associated text and labels …

DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised Pre-Training

G Jimenez-Perez, P Osorio, J Cersovsky… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models (DMs) have emerged as powerful foundation models for a variety of tasks,
with a large focus in synthetic image generation. However, their requirement of large …

An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation

A Abdulaal, H Fry, N Montaña-Brown… - arXiv preprint arXiv …, 2024 - arxiv.org
Radiological services are experiencing unprecedented demand, leading to increased
interest in automating radiology report generation. Existing Vision-Language Models (VLMs) …

LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts

Z Wang, Y Sun, Z Li, X Yang, F Chen, H Liao - arXiv preprint arXiv …, 2024 - arxiv.org
Drafting radiology reports is a complex task requiring flexibility, where radiologists tail
content to available information and particular clinical demands. However, most current …

M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation

J Park, S Kim, B Yoon, J Hyun, K Choi - arXiv preprint arXiv:2408.16213, 2024 - arxiv.org
The rapid evolution of artificial intelligence, especially in large language models (LLMs), has
significantly impacted various domains, including healthcare. In chest X-ray (CXR) analysis …

Overcoming data scarcity in biomedical imaging with a foundational multi-task model

R Schäfer, T Nicke, H Höfener, A Lange… - Nature Computational …, 2024 - nature.com
Foundational models, pretrained on a large scale, have demonstrated substantial success
across non-medical domains. However, training these models typically requires large …