Bi-modal transformer-based approach for visual question answering in remote sensing imagery

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com

Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

被引用次数：166 相关文章所有 8 个版本

[HTML] sciencedirect.com

[HTML][HTML] RS-CLIP: Zero shot remote sensing scene classification via contrastive vision-language supervision

X Li, C Wen, Y Hu, N Zhou - … Journal of Applied Earth Observation and …, 2023 - Elsevier

Zero-shot remote sensing scene classification aims to solve the scene classification problem
on unseen categories and has attracted numerous research attention in the remote sensing …

被引用次数：16 相关文章所有 3 个版本

[PDF] thecvf.com

Geochat: Grounded large vision-language model for remote sensing

K Kuckreja, MS Danish, M Naseer… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Recent advancements in Large Vision-Language Models (VLMs) have shown great
promise in natural image domains allowing users to hold a dialogue about given visual …

被引用次数：24 相关文章所有 3 个版本

[PDF] arxiv.org

Rsgpt: A remote sensing vision language model and benchmark

Y Hu, J Yuan, C Wen, X Lu, X Li - arXiv preprint arXiv:2307.15266, 2023 - arxiv.org

The emergence of large-scale large language models, with GPT-4 as a prominent example,
has significantly propelled the rapid advancement of artificial general intelligence and …

被引用次数：30 相关文章所有 2 个版本

[PDF] arxiv.org

Vision-language models in remote sensing: Current progress and future trends

X Li, C Wen, Y Hu, Z Yuan… - IEEE Geoscience and …, 2024 - ieeexplore.ieee.org

The remarkable achievements of ChatGPT and Generative Pre-trained Transformer 4 (GPT-
4) have sparked a wave of interest and research in the field of large language models …

被引用次数：22 相关文章所有 5 个版本

A spatial hierarchical reasoning network for remote sensing visual question answering

Z Zhang, L Jiao, L Li, X Liu, P Chen… - … on Geoscience and …, 2023 - ieeexplore.ieee.org

For visual question answering on remote sensing (RSVQA), current methods scarcely
consider geospatial objects typically with large-scale differences and positional sensitive …

被引用次数：21 相关文章所有 2 个版本

Self-supervised pretraining via multimodality images with transformer for change detection

Y Zhang, Y Zhao, Y Dong, B Du - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Self-supervised learning (SSL) has shown remarkable success in image representation
learning. Among these methods, masked image modeling and contrastive learning are the …

被引用次数：17 相关文章所有 2 个版本

[HTML] mdpi.com

[HTML][HTML] Machine-to-machine visual dialoguing with ChatGPT for enriched textual image description

R Ricci, Y Bazi, F Melgani - Remote Sensing, 2024 - mdpi.com

Image captioning is a technique that enables the automatic extraction of natural language
descriptions about the contents of an image. On the one hand, information in the form of …

被引用次数：4 相关文章所有 4 个版本

[HTML] mdpi.com

[HTML][HTML] Rs-llava: A large vision-language model for joint captioning and question answering in remote sensing imagery

Y Bazi, L Bashmal, MM Al Rahhal, R Ricci, F Melgani - Remote Sensing, 2024 - mdpi.com

In this paper, we delve into the innovative application of large language models (LLMs) and
their extension, large vision-language models (LVLMs), in the field of remote sensing (RS) …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Large language models for captioning and retrieving remote sensing images

JD Silva, J Magalhães, D Tuia, B Martins - arXiv preprint arXiv:2402.06475, 2024 - arxiv.org

Image captioning and cross-modal retrieval are examples of tasks that involve the joint
analysis of visual and linguistic information. In connection to remote sensing imagery, these …

被引用次数：5 相关文章所有 2 个版本