[HTML][HTML] Few-shot remote sensing image scene classification: Recent advances, new baselines, and future trends
Remote sensing image scene classification (RSI-SC) is crucial for various high-level
applications, including RSI retrieval, image captioning, and object detection. Deep learning …
applications, including RSI retrieval, image captioning, and object detection. Deep learning …
Ad-clip: Adapting domains in prompt space using clip
Although deep learning models have shown impressive performance on supervised
learning tasks, they often struggle to generalize well when the training (source) and test …
learning tasks, they often struggle to generalize well when the training (source) and test …
[HTML][HTML] Advancements in Vision–Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in
artificial intelligence (AI), and the advancements in Vision–Language Models (VLMs) have …
artificial intelligence (AI), and the advancements in Vision–Language Models (VLMs) have …
Stylip: Multi-scale style-conditioned prompt learning for clip-based domain generalization
Abstract arge-scale foundation models, such as CLIP, have demonstrated impressive zero-
shot generalization performance on downstream tasks, leveraging well-designed language …
shot generalization performance on downstream tasks, leveraging well-designed language …
Unknown Prompt the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization
Abstract We delve into Open Domain Generalization (ODG) marked by domain and category
shifts between training's labeled source and testing's unlabeled target domains. Existing …
shifts between training's labeled source and testing's unlabeled target domains. Existing …
[HTML][HTML] Multimodal Contrastive Learning for Remote Sensing Image Feature Extraction Based on Relaxed Positive Samples
Z Zhang, Q Li, W Jing, G He, L Zhu, S Gao - Sensors, 2024 - mdpi.com
Traditional multimodal contrastive learning brings text and its corresponding image closer
together as a positive pair, where the text typically consists of fixed sentence structures or …
together as a positive pair, where the text typically consists of fixed sentence structures or …
Laddering vision foundation model for remote sensing image change detection
Y Liu, G Zhou - Journal of Applied Remote Sensing, 2024 - spiedigitallibrary.org
This paper proposes a novel laddering vision foundation model for change detection (CD) of
remote sensing images. Current approaches have limitations in simultaneously extracting …
remote sensing images. Current approaches have limitations in simultaneously extracting …
Frequency-Aware Multi-Modal Fine-Tuning for Few-Shot Open-Set Remote Sensing Scene Classification
Few-shot open-set recognition, as a new paradigm, leveraging a limited amount of
supervised data to identify specific Remote Sensing (RS) scene categories and generalize …
supervised data to identify specific Remote Sensing (RS) scene categories and generalize …
Segclip: Multimodal visual-language and prompt learning for high-resolution remote sensing semantic segmentation
S Zhang, B Zhang, Y Wu, H Zhou… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Remote sensing semantic segmentation is considered a key step in the intelligent
interpretation of high-resolution remote sensing (HRRS) images, with widespread …
interpretation of high-resolution remote sensing (HRRS) images, with widespread …
Mind the modality gap: Towards a remote sensing vision-language model via cross-modal alignment
Deep Learning (DL) is undergoing a paradigm shift with the emergence of foundation
models, aptly named by their crucial, yet incomplete nature. In this work, we focus on …
models, aptly named by their crucial, yet incomplete nature. In this work, we focus on …