MMTM: Multimodal transfer module for CNN fusion

J Lipkova, RJ Chen, B Chen, MY Lu, M Barbieri… - Cancer cell, 2022 - cell.com

In oncology, the patient state is characterized by a whole spectrum of modalities, ranging
from radiology, histology, and genomics to electronic health records. Current artificial …

被引用次数：214 相关文章所有 8 个版本

[PDF] arxiv.org

An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

被引用次数：247 相关文章所有 6 个版本

[PDF] thecvf.com

Revisiting skeleton-based action recognition

H Duan, Y Zhao, K Chen, D Lin… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Human skeleton, as a compact representation of human action, has received increasing
attention in recent years. Many skeleton-based action recognition methods adopt GCNs to …

被引用次数：549 相关文章所有 7 个版本

[PDF] springer.com

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

K Bayoudh, R Knani, F Hamdaoui, A Mtibaa - The Visual Computer, 2022 - Springer

The research progress in multimodal learning has grown rapidly over the last decade in
several areas, especially in computer vision. The growing potential of multimodal data …

被引用次数：283 相关文章所有 7 个版本

[PDF] thecvf.com

Star-transformer: a spatio-temporal cross attention transformer for human action recognition

D Ahn, S Kim, H Hong, BC Ko - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

In action recognition, although the combination of spatio-temporal videos and skeleton
features can improve the recognition performance, a separate model and balancing feature …

被引用次数：105 相关文章所有 6 个版本

[PDF] arxiv.org

Expansion-squeeze-excitation fusion network for elderly activity recognition

X Shu, J Yang, R Yan, Y Song - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org

This work focuses on the task of elderly activity recognition, which is a challenging task due
to the existence of individual actions and human-object interactions in elderly activities …

被引用次数：157 相关文章所有 4 个版本

[PDF] thecvf.com

Delivering arbitrary-modal semantic segmentation

J Zhang, R Liu, H Shi, K Yang, S Reiß… - Proceedings of the …, 2023 - openaccess.thecvf.com

Multimodal fusion can make semantic segmentation more robust. However, fusing an
arbitrary number of modalities remains underexplored. To delve into this problem, we create …

被引用次数：58 相关文章所有 7 个版本

[PDF] mdpi.com

Computer vision, IoT and data fusion for crop disease detection using machine learning: A survey and ongoing research

M Ouhami, A Hafiane, Y Es-Saady, M El Hajji… - Remote Sensing, 2021 - mdpi.com

Crop diseases constitute a serious issue in agriculture, affecting both quality and quantity of
agriculture production. Disease control has been a research object in many scientific and …

被引用次数：158 相关文章所有 9 个版本

[PDF] mlr.press

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

N Wu, S Jastrzebski, K Cho… - … Conference on Machine …, 2022 - proceedings.mlr.press

We hypothesize that due to the greedy nature of learning in multi-modal deep neural
networks, these models tend to rely on just one modality while under-fitting the other …

被引用次数：72 相关文章所有 8 个版本

[PDF] mlr.press

Provable dynamic fusion for low-quality multimodal data

Q Zhang, H Wu, C Zhang, Q Hu, H Fu… - International …, 2023 - proceedings.mlr.press

The inherent challenge of multimodal fusion is to precisely capture the cross-modal
correlation and flexibly conduct cross-modal interaction. To fully release the value of each …

被引用次数：32 相关文章所有 6 个版本