Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

A brief survey on semantic segmentation with deep learning

S Hao, Y Zhou, Y Guo - Neurocomputing, 2020 - Elsevier
Semantic segmentation is a challenging task in computer vision. In recent years, the
performance of semantic segmentation has been greatly improved by using deep learning …

Neural collaborative filtering

X He, L Liao, H Zhang, L Nie, X Hu… - Proceedings of the 26th …, 2017 - dl.acm.org
In recent years, deep neural networks have yielded immense success on speech
recognition, computer vision and natural language processing. However, the exploration of …

Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention

J Chen, H Zhang, X He, L Nie, W Liu… - Proceedings of the 40th …, 2017 - dl.acm.org
Multimedia content is dominating today's Web information. The nature of multimedia user-
item interactions is 1/0 binary implicit feedback (eg, photo likes, video views, song …

Deep item-based collaborative filtering for top-n recommendation

F Xue, X He, X Wang, J Xu, K Liu, R Hong - ACM Transactions on …, 2019 - dl.acm.org
Item-based Collaborative Filtering (ICF) has been widely adopted in recommender systems
in industry, owing to its strength in user interest modeling and ease in online …

Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion

Y Wang - ACM Transactions on Multimedia Computing …, 2021 - dl.acm.org
With the development of web technology, multi-modal or multi-view data has surged as a
major stream for big data, where each modal/view encodes individual property of data …

Cross-modal retrieval with CNN visual features: A new baseline

Y Wei, Y Zhao, C Lu, S Wei, L Liu… - IEEE transactions on …, 2016 - ieeexplore.ieee.org
Recently, convolutional neural network (CNN) visual features have demonstrated their
powerful ability as a universal representation for various recognition tasks. In this paper …

Deep multimodal distance metric learning using click constraints for image ranking

J Yu, X Yang, F Gao, D Tao - IEEE transactions on cybernetics, 2016 - ieeexplore.ieee.org
How do we retrieve images accurately? Also, how do we rank a group of images precisely
and efficiently for specific queries? These problems are critical for researchers and …

Modality-invariant asymmetric networks for cross-modal hashing

Z Zhang, H Luo, L Zhu, G Lu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Cross-modal hashing has garnered considerable attention and gained great success in
many cross-media similarity search applications due to its prominent computational …