TextRS: Deep bidirectional triplet network for matching text to remote sensing images

X Sun, Y Tian, W Lu, P Wang, R Niu, H Yu… - Science China Information …, 2023 - Springer

Modality is a source or form of information. Through various modal information, humans can
perceive the world from multiple perspectives. Simultaneously, the observation of remote …

被引用次数：51 相关文章所有 3 个版本

[PDF] arxiv.org

Vision-language models in remote sensing: Current progress and future trends

X Li, C Wen, Y Hu, Z Yuan… - IEEE Geoscience and …, 2024 - ieeexplore.ieee.org

The remarkable achievements of ChatGPT and Generative Pre-trained Transformer 4 (GPT-
4) have sparked a wave of interest and research in the field of large language models …

被引用次数：54 相关文章所有 5 个版本

Image retrieval from remote sensing big data: A survey

Y Li, J Ma, Y Zhang - Information Fusion, 2021 - Elsevier

The blooming proliferation of aeronautics and astronautics platforms, together with the ever-
increasing remote sensing imaging sensors on these platforms, has led to the formation of …

被引用次数：198 相关文章

[PDF] arxiv.org

Remoteclip: A vision language foundation model for remote sensing

F Liu, D Chen, Z Guan, X Zhou, J Zhu… - … on Geoscience and …, 2024 - ieeexplore.ieee.org

General-purpose foundation models have led to recent breakthroughs in artificial
intelligence (AI). In remote sensing, self-supervised learning (SSL) and masked image …

被引用次数：121 相关文章所有 3 个版本

[PDF] arxiv.org

Remote sensing cross-modal text-image retrieval based on global and local information

Z Yuan, W Zhang, C Tian, X Rong… - … on Geoscience and …, 2022 - ieeexplore.ieee.org

Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent
research hotspot due to its ability of enabling fast and flexible information extraction on …

被引用次数：116 相关文章所有 3 个版本

[PDF] arxiv.org

Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval

Z Yuan, W Zhang, K Fu, X Li, C Deng, H Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Remote sensing (RS) cross-modal text-image retrieval has attracted extensive attention for
its advantages of flexible input and efficient query. However, traditional methods ignore the …

被引用次数：145 相关文章所有 3 个版本

[PDF] arxiv.org

Rsgpt: A remote sensing vision language model and benchmark

Y Hu, J Yuan, C Wen, X Lu, X Li - arXiv preprint arXiv:2307.15266, 2023 - arxiv.org

The emergence of large-scale large language models, with GPT-4 as a prominent example,
has significantly propelled the rapid advancement of artificial general intelligence and …

被引用次数：81 相关文章所有 2 个版本

[PDF] arxiv.org

Parameter-efficient transfer learning for remote sensing image-text retrieval

Y Yuan, Y Zhan, Z Xiong - IEEE Transactions on Geoscience …, 2023 - ieeexplore.ieee.org

Vision-and-language pretraining (VLP) models have experienced a surge in popularity
recently. By fine-tuning them on specific datasets, significant performance improvements …

被引用次数：37 相关文章所有 4 个版本

A lightweight multi-scale crossmodal text-image retrieval method in remote sensing

Z Yuan, W Zhang, X Rong, X Li, J Chen… - … on Geoscience and …, 2021 - ieeexplore.ieee.org

Remote sensing (RS) crossmodal text-image retrieval has become a research hotspot in
recent years for its application in semantic localization. However, since multiple inferences …

被引用次数：80 相关文章所有 2 个版本

Bi-modal transformer-based approach for visual question answering in remote sensing imagery

Y Bazi, MM Al Rahhal, ML Mekhalfi… - … on Geoscience and …, 2022 - ieeexplore.ieee.org

Recently, vision-language models based on transformers are gaining popularity for joint
modeling of visual and textual modalities. In particular, they show impressive results when …

被引用次数：53 相关文章所有 4 个版本