Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?
As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …
everywhere because of its ability to analyze and create text, images, and beyond. With such …
The forward-forward algorithm: Some preliminary investigations
G Hinton - arXiv preprint arXiv:2212.13345, 2022 - arxiv.org
The aim of this paper is to introduce a new learning procedure for neural networks and to
demonstrate that it works well enough on a few small problems to be worth further …
demonstrate that it works well enough on a few small problems to be worth further …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Rethinking semantic segmentation: A prototype view
Prevalent semantic segmentation solutions, despite their different network designs (FCN
based or attention based) and mask decoding strategies (parametric softmax based or pixel …
based or attention based) and mask decoding strategies (parametric softmax based or pixel …
Text and code embeddings by contrastive pre-training
Text embeddings are useful features in many applications such as semantic search and
computing text similarity. Previous work typically trains models customized for different use …
computing text similarity. Previous work typically trains models customized for different use …
An empirical study of training end-to-end vision-and-language transformers
Abstract Vision-and-language (VL) pre-training has proven to be highly effective on various
VL downstream tasks. While recent work has shown that fully transformer-based VL models …
VL downstream tasks. While recent work has shown that fully transformer-based VL models …
Self-supervised learning for recommender systems: A survey
In recent years, neural architecture-based recommender systems have achieved
tremendous success, but they still fall short of expectation when dealing with highly sparse …
tremendous success, but they still fall short of expectation when dealing with highly sparse …
Graph neural networks: foundation, frontiers and applications
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …
recent years. Graph neural networks, also known as deep learning on graphs, graph …
Emerging properties in self-supervised vision transformers
In this paper, we question if self-supervised learning provides new properties to Vision
Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the …
Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the …