Artificial intelligence for multimodal data integration in oncology

J Lipkova, RJ Chen, B Chen, MY Lu, M Barbieri… - Cancer cell, 2022 - cell.com
In oncology, the patient state is characterized by a whole spectrum of modalities, ranging
from radiology, histology, and genomics to electronic health records. Current artificial …

An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Revisiting skeleton-based action recognition

H Duan, Y Zhao, K Chen, D Lin… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Human skeleton, as a compact representation of human action, has received increasing
attention in recent years. Many skeleton-based action recognition methods adopt GCNs to …

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

K Bayoudh, R Knani, F Hamdaoui, A Mtibaa - The Visual Computer, 2022 - Springer
The research progress in multimodal learning has grown rapidly over the last decade in
several areas, especially in computer vision. The growing potential of multimodal data …

Star-transformer: a spatio-temporal cross attention transformer for human action recognition

D Ahn, S Kim, H Hong, BC Ko - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In action recognition, although the combination of spatio-temporal videos and skeleton
features can improve the recognition performance, a separate model and balancing feature …

Expansion-squeeze-excitation fusion network for elderly activity recognition

X Shu, J Yang, R Yan, Y Song - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org
This work focuses on the task of elderly activity recognition, which is a challenging task due
to the existence of individual actions and human-object interactions in elderly activities …

Delivering arbitrary-modal semantic segmentation

J Zhang, R Liu, H Shi, K Yang, S Reiß… - Proceedings of the …, 2023 - openaccess.thecvf.com
Multimodal fusion can make semantic segmentation more robust. However, fusing an
arbitrary number of modalities remains underexplored. To delve into this problem, we create …

Computer vision, IoT and data fusion for crop disease detection using machine learning: A survey and ongoing research

M Ouhami, A Hafiane, Y Es-Saady, M El Hajji… - Remote Sensing, 2021 - mdpi.com
Crop diseases constitute a serious issue in agriculture, affecting both quality and quantity of
agriculture production. Disease control has been a research object in many scientific and …

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

N Wu, S Jastrzebski, K Cho… - … Conference on Machine …, 2022 - proceedings.mlr.press
We hypothesize that due to the greedy nature of learning in multi-modal deep neural
networks, these models tend to rely on just one modality while under-fitting the other …

Provable dynamic fusion for low-quality multimodal data

Q Zhang, H Wu, C Zhang, Q Hu, H Fu… - International …, 2023 - proceedings.mlr.press
The inherent challenge of multimodal fusion is to precisely capture the cross-modal
correlation and flexibly conduct cross-modal interaction. To fully release the value of each …