Improved fusion of visual and language representations by dense symmetric co-attention for...

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com

Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

被引用次数：176 相关文章所有 8 个版本

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

被引用次数：153 相关文章所有 7 个版本

[PDF] arxiv.org

Multi-task learning with deep neural networks: A survey

M Crawshaw - arXiv preprint arXiv:2009.09796, 2020 - arxiv.org

Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are
simultaneously learned by a shared model. Such approaches offer advantages like …

被引用次数：715 相关文章所有 2 个版本

[PDF] thecvf.com

Multimodal co-attention transformer for survival prediction in gigapixel whole slide images

RJ Chen, MY Lu, WH Weng, TY Chen… - Proceedings of the …, 2021 - openaccess.thecvf.com

Survival outcome prediction is a challenging weakly-supervised and ordinal regression task
in computational pathology that involves modeling complex interactions within the tumor …

被引用次数：196 相关文章所有 6 个版本

[PDF] thecvf.com

Deep modular co-attention networks for visual question answering

Z Yu, J Yu, Y Cui, D Tao, Q Tian - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Abstract Visual Question Answering (VQA) requires a fine-grained and simultaneous
understanding of both the visual content of images and the textual content of questions …

被引用次数：946 相关文章所有 11 个版本

[PDF] arxiv.org

Changer: Feature interaction is what you need for change detection

S Fang, K Li, Z Li - IEEE Transactions on Geoscience and …, 2023 - ieeexplore.ieee.org

Change detection is an important tool for long-term Earth observation missions. It takes bi-
temporal images as input and predicts “where” the change has occurred. Different from other …

被引用次数：100 相关文章所有 4 个版本

[PDF] aclanthology.org

[PDF][PDF] Multimodal fusion with co-attention networks for fake news detection

Y Wu, P Zhan, Y Zhang, L Wang… - Findings of the association …, 2021 - aclanthology.org

Fake news with textual and visual contents has a better story-telling ability than text-only
contents, and can be spread quickly with social media. People can be easily deceived by …

被引用次数：177 相关文章所有 3 个版本

[PDF] google.com

Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition

W Liu, JL Qiu, WL Zheng, BL Lu - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Multimodal signals are powerful for emotion recognition since they can represent emotions
comprehensively. In this article, we compare the recognition performance and robustness of …

被引用次数：197 相关文章所有 2 个版本

[PDF] researchgate.net

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer

In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

被引用次数：181 相关文章所有 8 个版本

[PDF] thecvf.com

Mukea: Multimodal knowledge extraction and accumulation for knowledge-based visual question answering

Y Ding, J Yu, B Liu, Y Hu, M Cui… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Abstract Knowledge-based visual question answering requires the ability of associating
external knowledge for open-ended cross-modal scene understanding. One limitation of …

被引用次数：97 相关文章所有 7 个版本