From single-to multi-modal remote sensing imagery interpretation: A survey and taxonomy
Modality is a source or form of information. Through various modal information, humans can
perceive the world from multiple perspectives. Simultaneously, the observation of remote …
perceive the world from multiple perspectives. Simultaneously, the observation of remote …
Vision-language models in remote sensing: Current progress and future trends
The remarkable achievements of ChatGPT and Generative Pre-trained Transformer 4 (GPT-
4) have sparked a wave of interest and research in the field of large language models …
4) have sparked a wave of interest and research in the field of large language models …
Image retrieval from remote sensing big data: A survey
The blooming proliferation of aeronautics and astronautics platforms, together with the ever-
increasing remote sensing imaging sensors on these platforms, has led to the formation of …
increasing remote sensing imaging sensors on these platforms, has led to the formation of …
Remoteclip: A vision language foundation model for remote sensing
General-purpose foundation models have led to recent breakthroughs in artificial
intelligence (AI). In remote sensing, self-supervised learning (SSL) and masked image …
intelligence (AI). In remote sensing, self-supervised learning (SSL) and masked image …
Remote sensing cross-modal text-image retrieval based on global and local information
Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent
research hotspot due to its ability of enabling fast and flexible information extraction on …
research hotspot due to its ability of enabling fast and flexible information extraction on …
Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval
Remote sensing (RS) cross-modal text-image retrieval has attracted extensive attention for
its advantages of flexible input and efficient query. However, traditional methods ignore the …
its advantages of flexible input and efficient query. However, traditional methods ignore the …
Rsgpt: A remote sensing vision language model and benchmark
The emergence of large-scale large language models, with GPT-4 as a prominent example,
has significantly propelled the rapid advancement of artificial general intelligence and …
has significantly propelled the rapid advancement of artificial general intelligence and …
Parameter-efficient transfer learning for remote sensing image-text retrieval
Vision-and-language pretraining (VLP) models have experienced a surge in popularity
recently. By fine-tuning them on specific datasets, significant performance improvements …
recently. By fine-tuning them on specific datasets, significant performance improvements …
A lightweight multi-scale crossmodal text-image retrieval method in remote sensing
Remote sensing (RS) crossmodal text-image retrieval has become a research hotspot in
recent years for its application in semantic localization. However, since multiple inferences …
recent years for its application in semantic localization. However, since multiple inferences …
Bi-modal transformer-based approach for visual question answering in remote sensing imagery
Recently, vision-language models based on transformers are gaining popularity for joint
modeling of visual and textual modalities. In particular, they show impressive results when …
modeling of visual and textual modalities. In particular, they show impressive results when …