Pre-trained language models in biomedical domain: A systematic survey

B Wang, Q Xie, J Pei, Z Chen, P Tiwari, Z Li… - ACM Computing …, 2023 - dl.acm.org
Pre-trained language models (PLMs) have been the de facto paradigm for most natural
language processing tasks. This also benefits the biomedical domain: researchers from …

Making the most of text semantics to improve biomedical vision–language processing

B Boecking, N Usuyama, S Bannur, DC Castro… - European conference on …, 2022 - Springer
Multi-modal data abounds in biomedicine, such as radiology images and reports.
Interpreting this data at scale is essential for improving clinical care and accelerating clinical …

Contrastive learning of medical visual representations from paired images and text

Y Zhang, H Jiang, Y Miura… - Machine Learning …, 2022 - proceedings.mlr.press
Learning visual representations of medical images (eg, X-rays) is core to medical image
understanding but its progress has been held back by the scarcity of human annotations …

[PDF][PDF] Large-scale domain-specific pretraining for biomedical vision-language processing

S Zhang, Y Xu, N Usuyama, J Bagga… - arXiv preprint arXiv …, 2023 - researchgate.net
Contrastive pretraining on parallel image-text data has attained great success in vision-
language processing (VLP), as exemplified by CLIP and related methods. However, prior …

Clip in medical imaging: A comprehensive survey

Z Zhao, Y Liu, H Wu, Y Li, S Wang, L Teng… - arXiv preprint arXiv …, 2023 - arxiv.org
Contrastive Language-Image Pre-training (CLIP), a straightforward yet effective pre-training
paradigm, successfully introduces semantic-rich text supervision to vision models and has …

Multimodal variational auto-encoder based audio-visual segmentation

Y Mao, J Zhang, M Xiang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We propose an Explicit Conditional Multimodal Variational Auto-Encoder
(ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the …

Joint learning of localized representations from medical images and reports

P Müller, G Kaissis, C Zou, D Rueckert - European Conference on …, 2022 - Springer
Contrastive learning has proven effective for pre-training image models on unlabeled data
with promising results for tasks such as medical image classification. Using paired text (like …

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

S Zhang, Y Xu, N Usuyama, H Xu, J Bagga… - arXiv preprint arXiv …, 2023 - arxiv.org
Biomedical data is inherently multimodal, comprising physical measurements and natural
language narratives. A generalist biomedical AI model needs to simultaneously process …

A scoping review on multimodal deep learning in biomedical images and texts

Z Sun, M Lin, Q Zhu, Q Xie, F Wang, Z Lu… - Journal of Biomedical …, 2023 - Elsevier
Objective Computer-assisted diagnostic and prognostic systems of the future should be
capable of simultaneously processing multimodal data. Multimodal deep learning (MDL) …

S-clip: Semi-supervised vision-language learning using few specialist captions

S Mo, M Kim, K Lee, J Shin - Advances in Neural …, 2023 - proceedings.neurips.cc
Vision-language models, such as contrastive language-image pre-training (CLIP), have
demonstrated impressive results in natural image domains. However, these models often …