MPCCT: Multimodal vision-language learning paradigm with context-based compact Transformer

C Chen, D Han, CC Chang - Pattern Recognition, 2024 - Elsevier
Transformer and its variants have become the preferred option for multimodal vision-
language paradigms. However, they struggle with tasks that demand high-dependency …

Deep Multimodal Data Fusion

F Zhao, C Zhang, B Geng - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data
(eg, images, texts, or data collected from different sensors), feature engineering (eg …

A multimodal hybrid parallel network intrusion detection model

S Shi, D Han, M Cui - Connection Science, 2023 - Taylor & Francis
With the rapid growth of Internet data traffic, the means of malicious attack become more
diversified. The single modal intrusion detection model cannot fully exploit the rich feature …

CLVIN: Complete language-vision interaction network for visual question answering

C Chen, D Han, X Shen - Knowledge-Based Systems, 2023 - Elsevier
The emergence of the Transformer optimizes the interactive modeling of multimodal
information in visual question answering (VQA) tasks, helping machines better understand …

Intelligent productivity transformation: corporate market demand forecasting with the aid of an AI virtual assistant

B Liu, M Li, Z Ji, H Li, J Luo - Journal of Organizational and End User …, 2024 - igi-global.com
With the penetration of deep learning technology into forecasting and decision support
systems, enterprises have an increasingly urgent need for accurate forecasting of time …

Multi-modal adaptive gated mechanism for visual question answering

Y Xu, L Zhang, X Shen - Plos one, 2023 - journals.plos.org
Visual Question Answering (VQA) is a multimodal task that uses natural language to ask and
answer questions based on image content. For multimodal tasks, obtaining accurate …

[HTML][HTML] BoostedDim attention: A novel data-driven approach to improving LiDAR-based lane detection

O Patil, BB Nair, R Soni, A Thayyilravi… - Ain Shams Engineering …, 2024 - Elsevier
Lane detection is a fundamental component of advanced driver assistance systems,
facilitating critical functionalities like Lane Keep/Change Assistance, Lane Departure …

Relational reasoning and adaptive fusion for visual question answering

X Shen, D Han, L Zong, Z Guo, J Hua - Applied Intelligence, 2024 - Springer
Visual relationship modeling plays an indispensable role in visual question answering
(VQA). VQA models need to fully understand the visual scene and positional relationships …

ARDN: Attention Re-distribution Network for Visual Question Answering

J Yi, D Han, C Chen, X Shen, L Zong - Arabian Journal for Science and …, 2024 - Springer
The Transformer-based architecture, with its efficient parallel computation, long-range
dependency modeling, and context-aware capabilities, has showcased remarkable …

Subgraph representation learning with self-attention and free adversarial training

D Qin, X Tang, J Lu - Applied Intelligence, 2024 - Springer
Due to its capacity to capture subgraph information within graph data, subgraph
representation learning has garnered considerable attention in recent years. However …