From single-to multi-modal remote sensing imagery interpretation: A survey and taxonomy

X Sun, Y Tian, W Lu, P Wang, R Niu, H Yu… - Science China Information …, 2023 - Springer
Modality is a source or form of information. Through various modal information, humans can
perceive the world from multiple perspectives. Simultaneously, the observation of remote …

Vision-language models in remote sensing: Current progress and future trends

X Li, C Wen, Y Hu, Z Yuan… - IEEE Geoscience and …, 2024 - ieeexplore.ieee.org
The remarkable achievements of ChatGPT and Generative Pre-trained Transformer 4 (GPT-
4) have sparked a wave of interest and research in the field of large language models …

Image retrieval from remote sensing big data: A survey

Y Li, J Ma, Y Zhang - Information Fusion, 2021 - Elsevier
The blooming proliferation of aeronautics and astronautics platforms, together with the ever-
increasing remote sensing imaging sensors on these platforms, has led to the formation of …

Remoteclip: A vision language foundation model for remote sensing

F Liu, D Chen, Z Guan, X Zhou, J Zhu… - … on Geoscience and …, 2024 - ieeexplore.ieee.org
General-purpose foundation models have led to recent breakthroughs in artificial
intelligence (AI). In remote sensing, self-supervised learning (SSL) and masked image …

Remote sensing cross-modal text-image retrieval based on global and local information

Z Yuan, W Zhang, C Tian, X Rong… - … on Geoscience and …, 2022 - ieeexplore.ieee.org
Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent
research hotspot due to its ability of enabling fast and flexible information extraction on …

Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval

Z Yuan, W Zhang, K Fu, X Li, C Deng, H Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Remote sensing (RS) cross-modal text-image retrieval has attracted extensive attention for
its advantages of flexible input and efficient query. However, traditional methods ignore the …

Rsgpt: A remote sensing vision language model and benchmark

Y Hu, J Yuan, C Wen, X Lu, X Li - arXiv preprint arXiv:2307.15266, 2023 - arxiv.org
The emergence of large-scale large language models, with GPT-4 as a prominent example,
has significantly propelled the rapid advancement of artificial general intelligence and …

Parameter-efficient transfer learning for remote sensing image-text retrieval

Y Yuan, Y Zhan, Z Xiong - IEEE Transactions on Geoscience …, 2023 - ieeexplore.ieee.org
Vision-and-language pretraining (VLP) models have experienced a surge in popularity
recently. By fine-tuning them on specific datasets, significant performance improvements …

A lightweight multi-scale crossmodal text-image retrieval method in remote sensing

Z Yuan, W Zhang, X Rong, X Li, J Chen… - … on Geoscience and …, 2021 - ieeexplore.ieee.org
Remote sensing (RS) crossmodal text-image retrieval has become a research hotspot in
recent years for its application in semantic localization. However, since multiple inferences …

Bi-modal transformer-based approach for visual question answering in remote sensing imagery

Y Bazi, MM Al Rahhal, ML Mekhalfi… - … on Geoscience and …, 2022 - ieeexplore.ieee.org
Recently, vision-language models based on transformers are gaining popularity for joint
modeling of visual and textual modalities. In particular, they show impressive results when …