Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities

Y Song, T Wang, P Cai, SK Mondal… - ACM Computing Surveys, 2023 - dl.acm.org
Few-shot learning (FSL) has emerged as an effective learning method and shows great
potential. Despite the recent creative works in tackling FSL tasks, learning valid information …

Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding

L Xue, M Gao, C Xing, R Martín-Martín… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recognition capabilities of current state-of-the-art 3D models are limited by datasets with
a small number of annotated data and a pre-defined set of categories. In its 2D counterpart …

Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding

M Afham, I Dissanayake… - Proceedings of the …, 2022 - openaccess.thecvf.com
Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object
classification, segmentation and detection is often laborious owing to the irregular structure …

Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models

Z Lin, S Yu, Z Kuang, D Pathak… - Proceedings of the …, 2023 - openaccess.thecvf.com
The ability to quickly learn a new task with minimal instruction-known as few-shot learning-is
a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot …

Knowledge-guided semantic transfer network for few-shot image recognition

Z Li, H Tang, Z Peng, GJ Qi… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Deep learning-based models have been shown to outperform human beings in many
computer vision tasks with massive available labeled training data in learning. However …

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Few-shot classification with contrastive learning

Z Yang, J Wang, Y Zhu - European conference on computer vision, 2022 - Springer
A two-stage training paradigm consisting of sequential pre-training and meta-training stages
has been widely used in current few-shot learning (FSL) research. Many of these methods …

[HTML][HTML] Multibench: Multiscale benchmarks for multimodal representation learning

PP Liang, Y Lyu, X Fan, Z Wu, Y Cheng… - Advances in neural …, 2021 - ncbi.nlm.nih.gov
Learning multimodal representations involves integrating information from multiple
heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world …

Improved few-shot visual classification

P Bateni, R Goyal, V Masrani… - Proceedings of the …, 2020 - openaccess.thecvf.com
Few-shot learning is a fundamental task in computer vision that carries the promise of
alleviating the need for exhaustively labeled data. Most few-shot learning approaches to …