[HTML][HTML] Few-shot remote sensing image scene classification: Recent advances, new baselines, and future trends

C Qiu, X Zhang, X Tong, N Guan, X Yi, K Yang… - ISPRS Journal of …, 2024 - Elsevier
Remote sensing image scene classification (RSI-SC) is crucial for various high-level
applications, including RSI retrieval, image captioning, and object detection. Deep learning …

Ad-clip: Adapting domains in prompt space using clip

M Singha, H Pal, A Jha… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Although deep learning models have shown impressive performance on supervised
learning tasks, they often struggle to generalize well when the training (source) and test …

[HTML][HTML] Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques

L Tao, H Zhang, H Jing, Y Liu, D Yan, G Wei, X Xue - Remote Sensing, 2025 - mdpi.com
Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in
artificial intelligence (AI), and the advancements in Vision–Language Models (VLMs) have …

Stylip: Multi-scale style-conditioned prompt learning for clip-based domain generalization

S Bose, A Jha, E Fini, M Singha… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract arge-scale foundation models, such as CLIP, have demonstrated impressive zero-
shot generalization performance on downstream tasks, leveraging well-designed language …

Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization

M Singha, A Jha, S Bose, A Nair… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We delve into Open Domain Generalization (ODG) marked by domain and category
shifts between training's labeled source and testing's unlabeled target domains. Existing …

[HTML][HTML] Multimodal Contrastive Learning for Remote Sensing Image Feature Extraction Based on Relaxed Positive Samples

Z Zhang, Q Li, W Jing, G He, L Zhu, S Gao - Sensors, 2024 - mdpi.com
Traditional multimodal contrastive learning brings text and its corresponding image closer
together as a positive pair, where the text typically consists of fixed sentence structures or …

Laddering vision foundation model for remote sensing image change detection

Y Liu, G Zhou - Journal of Applied Remote Sensing, 2024 - spiedigitallibrary.org
This paper proposes a novel laddering vision foundation model for change detection (CD) of
remote sensing images. Current approaches have limitations in simultaneously extracting …

Frequency-Aware Multi-Modal Fine-Tuning for Few-Shot Open-Set Remote Sensing Scene Classification

J Zhang, Y Rao, X Huang, G Li… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Few-shot open-set recognition, as a new paradigm, leveraging a limited amount of
supervised data to identify specific Remote Sensing (RS) scene categories and generalize …

Segclip: Multimodal visual-language and prompt learning for high-resolution remote sensing semantic segmentation

S Zhang, B Zhang, Y Wu, H Zhou… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Remote sensing semantic segmentation is considered a key step in the intelligent
interpretation of high-resolution remote sensing (HRRS) images, with widespread …

Mind the modality gap: Towards a remote sensing vision-language model via cross-modal alignment

A Zavras, D Michail, B Demir, I Papoutsis - arXiv preprint arXiv:2402.09816, 2024 - arxiv.org
Deep Learning (DL) is undergoing a paradigm shift with the emergence of foundation
models, aptly named by their crucial, yet incomplete nature. In this work, we focus on …