Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Foundations & Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2023 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

A survey of automatic text summarization: Progress, process and challenges

MF Mridha, AA Lima, K Nur, SC Das, M Hasan… - IEEE …, 2021 - ieeexplore.ieee.org
With the evolution of the Internet and multimedia technology, the amount of text data has
increased exponentially. This text volume is a precious source of information and knowledge …

An intelligent video analysis method for abnormal event detection in intelligent transportation systems

S Wan, X Xu, T Wang, Z Gu - IEEE Transactions on Intelligent …, 2020 - ieeexplore.ieee.org
Intelligent transportation systems pervasively deploy thousands of video cameras. Analyzing
live video streams from these cameras is of significant importance to public safety. As …

Intelligent character recognition using fully convolutional neural networks

R Ptucha, FP Such, S Pillai, F Brockler, V Singh… - Pattern recognition, 2019 - Elsevier
The recognition of handwritten text is challenging as there are virtually infinite ways a human
can write the same message. Deep learning approaches for handwriting analysis have …

A long video caption generation algorithm for big video data retrieval

S Ding, S Qu, Y Xi, S Wan - Future Generation Computer Systems, 2019 - Elsevier
Videos captured by people are often tied to certain important moments of their lives. But with
the era of big data coming, the time required to retrieval and watch can be daunting. In this …

Move forward and tell: A progressive generator of video descriptions

Y Xiong, B Dai, D Lin - Proceedings of the European …, 2018 - openaccess.thecvf.com
We present an efficient framework that can generate a coherent paragraph to describe a
given video. Previous works on video captioning usually focus on video clips. They typically …

Multimodal abstractive summarization for how2 videos

S Palaskar, J Libovický, S Gella, F Metze - arXiv preprint arXiv:1906.07901, 2019 - arxiv.org
In this paper, we study abstractive summarization for open-domain videos. Unlike the
traditional text news summarization, the goal is less to" compress" text information but rather …

Multinet++: Multi-stream feature aggregation and geometric loss strategy for multi-task learning

S Chennupati, G Sistu, S Yogamani… - Proceedings of the …, 2019 - openaccess.thecvf.com
Multi-task learning is commonly used in autonomous driving for solving various visual
perception tasks. It offers significant benefits in terms of both performance and computational …

A survey of recent work on video summarization: approaches and techniques

V Tiwari, C Bhatnagar - Multimedia Tools and Applications, 2021 - Springer
The volume of video data generated has seen an exponential growth over the years and
video summarization has emerged as a process that can facilitate efficient storage, quick …