Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Multimodal co-learning: Challenges, applications with datasets, recent advances and future directions
Multimodal deep learning systems that employ multiple modalities like text, image, audio,
video, etc., are showing better performance than individual modalities (ie, unimodal) …
video, etc., are showing better performance than individual modalities (ie, unimodal) …
Multimodal sentiment analysis based on fusion methods: A survey
Sentiment analysis is an emerging technology that aims to explore people's attitudes toward
an entity. It can be applied in a variety of different fields and scenarios, such as product …
an entity. It can be applied in a variety of different fields and scenarios, such as product …
Learning audio-visual speech representation by masked multimodal cluster prediction
Video recordings of speech contain correlated audio and visual information, providing a
strong signal for speech representation learning from the speaker's lip movements and the …
strong signal for speech representation learning from the speaker's lip movements and the …
Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis
In multimodal sentiment analysis (MSA), the performance of a model highly depends on the
quality of synthesized embeddings. These embeddings are generated from the upstream …
quality of synthesized embeddings. These embeddings are generated from the upstream …
Are multimodal transformers robust to missing modality?
Multimodal data collected from the real world are often imperfect due to missing modalities.
Therefore multimodal models that are robust against modal-incomplete data are highly …
Therefore multimodal models that are robust against modal-incomplete data are highly …
Disentangled representation learning for multimodal emotion recognition
Multimodal emotion recognition aims to identify human emotions from text, audio, and visual
modalities. Previous methods either explore correlations between different modalities or …
modalities. Previous methods either explore correlations between different modalities or …
Decoupled multimodal distilling for emotion recognition
Human multimodal emotion recognition (MER) aims to perceive human emotions via
language, visual and acoustic modalities. Despite the impressive performance of previous …
language, visual and acoustic modalities. Despite the impressive performance of previous …
Misa: Modality-invariant and-specific representations for multimodal sentiment analysis
Multimodal Sentiment Analysis is an active area of research that leverages multimodal
signals for affective understanding of user-generated videos. The predominant approach …
signals for affective understanding of user-generated videos. The predominant approach …
Multimodal prompting with missing modalities for visual recognition
In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when
missing-modality occurs either during training or testing in real-world situations; and 2) when …
missing-modality occurs either during training or testing in real-world situations; and 2) when …