Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Foundations & trends in multimodal machine learning: Principles, challenges, and open questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
A metaverse: Taxonomy, components, applications, and open challenges
SM Park, YG Kim - IEEE access, 2022 - ieeexplore.ieee.org
Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is
based on the social value of Generation Z that online and offline selves are not different …
based on the social value of Generation Z that online and offline selves are not different …
Merlot: Multimodal neural script knowledge models
As humans, we understand events in the visual world contextually, performing multimodal
reasoning across time to make inferences about the past, present, and future. We introduce …
reasoning across time to make inferences about the past, present, and future. We introduce …
Video summarization using deep neural networks: A survey
Video summarization technologies aim to create a concise and complete synopsis by
selecting the most informative parts of the video content. Several approaches have been …
selecting the most informative parts of the video content. Several approaches have been …
End-to-end learning of visual representations from uncurated instructional videos
Annotating videos is cumbersome, expensive and not scalable. Yet, many strong video
models still rely on manually annotated data. With the recent introduction of the HowTo100M …
models still rely on manually annotated data. With the recent introduction of the HowTo100M …
Multi-document summarization via deep learning techniques: A survey
Multi-document summarization (MDS) is an effective tool for information aggregation that
generates an informative and concise summary from a cluster of topic-related documents …
generates an informative and concise summary from a cluster of topic-related documents …
How2: a large-scale dataset for multimodal language understanding
In this paper, we introduce How2, a multimodal collection of instructional videos with English
subtitles and crowdsourced Portuguese translations. We also present integrated sequence …
subtitles and crowdsourced Portuguese translations. We also present integrated sequence …
Neural natural language generation: A survey on multilinguality, multimodality, controllability and learning
Developing artificial learning systems that can understand and generate natural language
has been one of the long-standing goals of artificial intelligence. Recent decades have …
has been one of the long-standing goals of artificial intelligence. Recent decades have …
Vision guided generative pre-trained language models for multimodal abstractive summarization
Multimodal abstractive summarization (MAS) models that summarize videos (vision
modality) and their corresponding transcripts (text modality) are able to extract the essential …
modality) and their corresponding transcripts (text modality) are able to extract the essential …