- 学术资源搜索

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

被引用次数：127 相关文章所有 2 个版本

[PDF] acm.org

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

被引用次数：11 相关文章

[PDF] ieee.org

A metaverse: Taxonomy, components, applications, and open challenges

SM Park, YG Kim - IEEE access, 2022 - ieeexplore.ieee.org

Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is
based on the social value of Generation Z that online and offline selves are not different …

被引用次数：1463 相关文章所有 6 个版本

[PDF] neurips.cc

Merlot: Multimodal neural script knowledge models

R Zellers, X Lu, J Hessel, Y Yu… - Advances in neural …, 2021 - proceedings.neurips.cc

As humans, we understand events in the visual world contextually, performing multimodal
reasoning across time to make inferences about the past, present, and future. We introduce …

被引用次数：357 相关文章所有 7 个版本

[PDF] arxiv.org

Video summarization using deep neural networks: A survey

E Apostolidis, E Adamantidou, AI Metsai… - Proceedings of the …, 2021 - ieeexplore.ieee.org

Video summarization technologies aim to create a concise and complete synopsis by
selecting the most informative parts of the video content. Several approaches have been …

被引用次数：212 相关文章所有 8 个版本

[PDF] thecvf.com

End-to-end learning of visual representations from uncurated instructional videos

A Miech, JB Alayrac, L Smaira… - Proceedings of the …, 2020 - openaccess.thecvf.com

Annotating videos is cumbersome, expensive and not scalable. Yet, many strong video
models still rely on manually annotated data. With the recent introduction of the HowTo100M …

被引用次数：748 相关文章所有 15 个版本

[PDF] arxiv.org

Multi-document summarization via deep learning techniques: A survey

C Ma, WE Zhang, M Guo, H Wang, QZ Sheng - ACM Computing Surveys, 2022 - dl.acm.org

Multi-document summarization (MDS) is an effective tool for information aggregation that
generates an informative and concise summary from a cluster of topic-related documents …

被引用次数：132 相关文章所有 9 个版本

[PDF] arxiv.org

How2: a large-scale dataset for multimodal language understanding

R Sanabria, O Caglayan, S Palaskar, D Elliott… - arXiv preprint arXiv …, 2018 - arxiv.org

In this paper, we introduce How2, a multimodal collection of instructional videos with English
subtitles and crowdsourced Portuguese translations. We also present integrated sequence …

被引用次数：281 相关文章所有 9 个版本

[PDF] jair.org Full View

Neural natural language generation: A survey on multilinguality, multimodality, controllability and learning

E Erdem, M Kuyu, S Yagcioglu, A Frank… - Journal of Artificial …, 2022 - jair.org

Developing artificial learning systems that can understand and generate natural language
has been one of the long-standing goals of artificial intelligence. Recent decades have …

被引用次数：45 相关文章所有 20 个版本

[PDF] arxiv.org

Vision guided generative pre-trained language models for multimodal abstractive summarization

T Yu, W Dai, Z Liu, P Fung - arXiv preprint arXiv:2109.02401, 2021 - arxiv.org

Multimodal abstractive summarization (MAS) models that summarize videos (vision
modality) and their corresponding transcripts (text modality) are able to extract the essential …

被引用次数：67 相关文章所有 7 个版本