Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

A metaverse: Taxonomy, components, applications, and open challenges

SM Park, YG Kim - IEEE access, 2022 - ieeexplore.ieee.org
Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is
based on the social value of Generation Z that online and offline selves are not different …

Merlot: Multimodal neural script knowledge models

R Zellers, X Lu, J Hessel, Y Yu… - Advances in neural …, 2021 - proceedings.neurips.cc
As humans, we understand events in the visual world contextually, performing multimodal
reasoning across time to make inferences about the past, present, and future. We introduce …

Video summarization using deep neural networks: A survey

E Apostolidis, E Adamantidou, AI Metsai… - Proceedings of the …, 2021 - ieeexplore.ieee.org
Video summarization technologies aim to create a concise and complete synopsis by
selecting the most informative parts of the video content. Several approaches have been …

End-to-end learning of visual representations from uncurated instructional videos

A Miech, JB Alayrac, L Smaira… - Proceedings of the …, 2020 - openaccess.thecvf.com
Annotating videos is cumbersome, expensive and not scalable. Yet, many strong video
models still rely on manually annotated data. With the recent introduction of the HowTo100M …

Multi-document summarization via deep learning techniques: A survey

C Ma, WE Zhang, M Guo, H Wang, QZ Sheng - ACM Computing Surveys, 2022 - dl.acm.org
Multi-document summarization (MDS) is an effective tool for information aggregation that
generates an informative and concise summary from a cluster of topic-related documents …

How2: a large-scale dataset for multimodal language understanding

R Sanabria, O Caglayan, S Palaskar, D Elliott… - arXiv preprint arXiv …, 2018 - arxiv.org
In this paper, we introduce How2, a multimodal collection of instructional videos with English
subtitles and crowdsourced Portuguese translations. We also present integrated sequence …

Neural natural language generation: A survey on multilinguality, multimodality, controllability and learning

E Erdem, M Kuyu, S Yagcioglu, A Frank… - Journal of Artificial …, 2022 - jair.org
Developing artificial learning systems that can understand and generate natural language
has been one of the long-standing goals of artificial intelligence. Recent decades have …

Vision guided generative pre-trained language models for multimodal abstractive summarization

T Yu, W Dai, Z Liu, P Fung - arXiv preprint arXiv:2109.02401, 2021 - arxiv.org
Multimodal abstractive summarization (MAS) models that summarize videos (vision
modality) and their corresponding transcripts (text modality) are able to extract the essential …