[HTML][HTML] Multimodal Contrastive Learning for Remote Sensing Image Feature Extraction Based on Relaxed Positive Samples

Z Zhang, Q Li, W Jing, G He, L Zhu, S Gao - Sensors, 2024 - mdpi.com
Traditional multimodal contrastive learning brings text and its corresponding image closer
together as a positive pair, where the text typically consists of fixed sentence structures or …

Mind the modality gap: Towards a remote sensing vision-language model via cross-modal alignment

A Zavras, D Michail, B Demir, I Papoutsis - arXiv preprint arXiv:2402.09816, 2024 - arxiv.org
Deep Learning (DL) is undergoing a paradigm shift with the emergence of foundation
models, aptly named by their crucial, yet incomplete nature. In this work, we focus on …

GraphVL: graph-enhanced semantic modeling via vision-language models for generalized class discovery

B Solanki, AR Nair, M Singha… - Proceedings of the …, 2024 - dl.acm.org
Generalized Category Discovery (GCD) aims to cluster unlabeled images into known and
novel categories using labeled images from known classes. To address the challenge of …

RS3Lip: Consistency for remote sensing image classification on part embeddings using self-supervised learning and CLIP

A Jha, M Singha, A Bhattacharya, B Banerjee - Computer Vision and Image …, 2024 - Elsevier
Tackling domain and class generalization challenges remains a significant hurdle in the
realm of remote sensing (RS). Recently, large-scale pre-trained vision-language models …

Diffusion in Zero-Shot Learning for Environmental Audio

Y Sims, S Chalup, A Mendes - arXiv preprint arXiv:2412.03771, 2024 - arxiv.org
Zero-shot learning enables models to generalize to unseen classes by leveraging semantic
information, bridging the gap between training and testing sets with non-overlapping …