The multi-modal fusion in visual question answering: a review of attention mechanisms
Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …
fields of computer vision and natural language processing that requires a computer to output …
A co-attention based multi-modal fusion network for review helpfulness prediction
The current review helpfulness prediction (RHP) methods simply rely on the textual features
and meta features to predict review helpfulness, overlooking the informational value of …
and meta features to predict review helpfulness, overlooking the informational value of …
CAAN: Context-aware attention network for visual question answering
C Chen, D Han, CC Chang - Pattern Recognition, 2022 - Elsevier
Understanding multimodal information is the key to visual question answering (VQA) tasks.
Most existing approaches use attention mechanisms to acquire fine-grained information …
Most existing approaches use attention mechanisms to acquire fine-grained information …
CLVIN: Complete language-vision interaction network for visual question answering
C Chen, D Han, X Shen - Knowledge-Based Systems, 2023 - Elsevier
The emergence of the Transformer optimizes the interactive modeling of multimodal
information in visual question answering (VQA) tasks, helping machines better understand …
information in visual question answering (VQA) tasks, helping machines better understand …
Local self-attention in transformer for visual question answering
X Shen, D Han, Z Guo, C Chen, J Hua, G Luo - Applied Intelligence, 2023 - Springer
Abstract Visual Question Answering (VQA) is a multimodal task that requires models to
understand both textual and visual information. Various VQA models have applied the …
understand both textual and visual information. Various VQA models have applied the …
Change detection meets visual question answering
The Earth's surface is continually changing, and identifying changes plays an important role
in urban planning and sustainability. Although change detection techniques have been …
in urban planning and sustainability. Although change detection techniques have been …
Learning visual question answering on controlled semantic noisy labels
Abstract Visual Question Answering (VQA) has made great progress recently due to the
increasing ability to understand and encode multi-modal inputs based on deep learning …
increasing ability to understand and encode multi-modal inputs based on deep learning …
Co-attention fusion network for multimodal skin cancer diagnosis
Recently, multimodal image-based methods have shown great performance in skin cancer
diagnosis. These methods usually use convolutional neural networks (CNNs) to extract the …
diagnosis. These methods usually use convolutional neural networks (CNNs) to extract the …
Vlcdoc: Vision-language contrastive pre-training model for cross-modal document classification
Multimodal learning from document data has achieved great success lately as it allows to
pre-train semantically meaningful features as a prior into a learnable downstream task. In …
pre-train semantically meaningful features as a prior into a learnable downstream task. In …
Cascaded feature fusion with multi-level self-attention mechanism for object detection
C Wang, H Wang - Pattern Recognition, 2023 - Elsevier
Object detection has been a challenging task due to the complexity and diversity of objects.
The emergence of self-attention mechanism provides a new clue for feature fusion in object …
The emergence of self-attention mechanism provides a new clue for feature fusion in object …