Pre-trained language models in biomedical domain: A systematic survey
Pre-trained language models (PLMs) have been the de facto paradigm for most natural
language processing tasks. This also benefits the biomedical domain: researchers from …
language processing tasks. This also benefits the biomedical domain: researchers from …
Making the most of text semantics to improve biomedical vision–language processing
Multi-modal data abounds in biomedicine, such as radiology images and reports.
Interpreting this data at scale is essential for improving clinical care and accelerating clinical …
Interpreting this data at scale is essential for improving clinical care and accelerating clinical …
Contrastive learning of medical visual representations from paired images and text
Learning visual representations of medical images (eg, X-rays) is core to medical image
understanding but its progress has been held back by the scarcity of human annotations …
understanding but its progress has been held back by the scarcity of human annotations …
[PDF][PDF] Large-scale domain-specific pretraining for biomedical vision-language processing
Contrastive pretraining on parallel image-text data has attained great success in vision-
language processing (VLP), as exemplified by CLIP and related methods. However, prior …
language processing (VLP), as exemplified by CLIP and related methods. However, prior …
Clip in medical imaging: A comprehensive survey
Contrastive Language-Image Pre-training (CLIP), a straightforward yet effective pre-training
paradigm, successfully introduces semantic-rich text supervision to vision models and has …
paradigm, successfully introduces semantic-rich text supervision to vision models and has …
Multimodal variational auto-encoder based audio-visual segmentation
Abstract We propose an Explicit Conditional Multimodal Variational Auto-Encoder
(ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the …
(ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the …
Joint learning of localized representations from medical images and reports
Contrastive learning has proven effective for pre-training image models on unlabeled data
with promising results for tasks such as medical image classification. Using paired text (like …
with promising results for tasks such as medical image classification. Using paired text (like …
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
Biomedical data is inherently multimodal, comprising physical measurements and natural
language narratives. A generalist biomedical AI model needs to simultaneously process …
language narratives. A generalist biomedical AI model needs to simultaneously process …
A scoping review on multimodal deep learning in biomedical images and texts
Objective Computer-assisted diagnostic and prognostic systems of the future should be
capable of simultaneously processing multimodal data. Multimodal deep learning (MDL) …
capable of simultaneously processing multimodal data. Multimodal deep learning (MDL) …
S-clip: Semi-supervised vision-language learning using few specialist captions
Vision-language models, such as contrastive language-image pre-training (CLIP), have
demonstrated impressive results in natural image domains. However, these models often …
demonstrated impressive results in natural image domains. However, these models often …