Local self-attention in transformer for visual question answering

C Chen, D Han, CC Chang - Pattern Recognition, 2024 - Elsevier

Transformer and its variants have become the preferred option for multimodal vision-
language paradigms. However, they struggle with tasks that demand high-dependency …

被引用次数：33 相关文章所有 3 个版本

[PDF] acm.org

Deep Multimodal Data Fusion

F Zhao, C Zhang, B Geng - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data
(eg, images, texts, or data collected from different sensors), feature engineering (eg …

被引用次数：3 相关文章

[PDF] tandfonline.com

A multimodal hybrid parallel network intrusion detection model

S Shi, D Han, M Cui - Connection Science, 2023 - Taylor & Francis

With the rapid growth of Internet data traffic, the means of malicious attack become more
diversified. The single modal intrusion detection model cannot fully exploit the rich feature …

被引用次数：32 相关文章所有 3 个版本

CLVIN: Complete language-vision interaction network for visual question answering

C Chen, D Han, X Shen - Knowledge-Based Systems, 2023 - Elsevier

The emergence of the Transformer optimizes the interactive modeling of multimodal
information in visual question answering (VQA) tasks, helping machines better understand …

被引用次数：33 相关文章所有 3 个版本

[PDF] igi-global.com

Intelligent productivity transformation: corporate market demand forecasting with the aid of an AI virtual assistant

B Liu, M Li, Z Ji, H Li, J Luo - Journal of Organizational and End User …, 2024 - igi-global.com

With the penetration of deep learning technology into forecasting and decision support
systems, enterprises have an increasingly urgent need for accurate forecasting of time …

被引用次数：25 相关文章所有 4 个版本

[PDF] plos.org

Multi-modal adaptive gated mechanism for visual question answering

Y Xu, L Zhang, X Shen - Plos one, 2023 - journals.plos.org

Visual Question Answering (VQA) is a multimodal task that uses natural language to ask and
answer questions based on image content. For multimodal tasks, obtaining accurate …

被引用次数：3 相关文章所有 7 个版本

[HTML] sciencedirect.com