Multi-level attention networks for visual question answering

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com

Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

被引用次数：211 相关文章所有 8 个版本

[PDF] github.io

Graph neural networks: foundation, frontiers and applications

L Wu, P Cui, J Pei, L Zhao, X Guo - … of the 28th ACM SIGKDD Conference …, 2022 - dl.acm.org

The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …

被引用次数：447 相关文章所有 11 个版本

[PDF] thecvf.com

Seeing out of the box: End-to-end pre-training for vision-language representation learning

Z Huang, Z Zeng, Y Huang, B Liu… - Proceedings of the …, 2021 - openaccess.thecvf.com

We study on joint learning of Convolutional Neural Network (CNN) and Transformer for
vision-language pre-training (VLPT) which aims to learn cross-modal alignments from …

被引用次数：296 相关文章所有 6 个版本

Towards zero-shot learning: A brief review and an attention-based embedding network

GS Xie, Z Zhang, H Xiong, L Shao… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Zero-shot learning (ZSL), an emerging topic in recent years, targets at distinguishing unseen
class images by taking images from seen classes for training the classifier. Existing works …

被引用次数：31 相关文章所有 2 个版本

[PDF] thecvf.com

Attention on attention for image captioning

L Huang, W Wang, J Chen… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Attention mechanisms are widely used in current encoder/decoder frameworks of image
captioning, where a weighted average on encoded vectors is generated at each time step to …

被引用次数：1115 相关文章所有 9 个版本

[PDF] neurips.cc

Cross attention network for few-shot classification

R Hou, H Chang, B Ma, S Shan… - Advances in neural …, 2019 - proceedings.neurips.cc

Few-shot classification aims to recognize unlabeled samples from unseen classes given
only few labeled samples. The unseen classes and low-data problem make few-shot …

被引用次数：788 相关文章所有 12 个版本

[PDF] researchgate.net

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer

In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

被引用次数：221 相关文章所有 8 个版本

[PDF] thecvf.com

Transvpr: Transformer-based place recognition with multi-level attention aggregation

R Wang, Y Shen, W Zuo, S Zhou… - Proceedings of the …, 2022 - openaccess.thecvf.com

Visual place recognition is a challenging task for applications such as autonomous driving
navigation and mobile robot localization. Distracting elements presenting in complex scenes …

被引用次数：134 相关文章所有 5 个版本

[PDF] thecvf.com

Camp: Cross-modal adaptive message passing for text-image retrieval

Z Wang, X Liu, H Li, L Sheng, J Yan… - Proceedings of the …, 2019 - openaccess.thecvf.com

Text-image cross-modal retrieval is a challenging task in the field of language and vision.
Most previous approaches independently embed images and sentences into a joint …

被引用次数：369 相关文章所有 8 个版本

[PDF] thecvf.com

Relation-aware graph attention network for visual question answering

L Li, Z Gan, Y Cheng, J Liu - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com

In order to answer semantically-complicated questions about an image, a Visual Question
Answering (VQA) model needs to fully understand the visual scene in the image, especially …

被引用次数：432 相关文章所有 8 个版本