A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt
Recently, ChatGPT, along with DALL-E-2 and Codex, has been gaining significant attention
from society. As a result, many individuals have become interested in related resources and …
from society. As a result, many individuals have become interested in related resources and …
Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
Multimodal learning with transformers: A survey
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
Prompt-aligned gradient for prompt tuning
Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we can craft a
zero-shot classifier by discrete prompt design, eg, the confidence score of an image …
zero-shot classifier by discrete prompt design, eg, the confidence score of an image …
[HTML][HTML] Pre-trained language models and their applications
Pre-trained language models have achieved striking success in natural language
processing (NLP), leading to a paradigm shift from supervised learning to pre-training …
processing (NLP), leading to a paradigm shift from supervised learning to pre-training …
Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm
Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has attracted
unprecedented attention for its impressive zero-shot recognition ability and excellent …
unprecedented attention for its impressive zero-shot recognition ability and excellent …
Scaling up vision-language pre-training for image captioning
In recent years, we have witnessed significant performance boost in the image captioning
task based on vision-language pre-training (VLP). Scale is believed to be an important factor …
task based on vision-language pre-training (VLP). Scale is believed to be an important factor …
Large-scale multi-modal pre-trained models: A comprehensive survey
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
Uni-mol: A universal 3d molecular representation learning framework
Molecular representation learning (MRL) has gained tremendous attention due to its critical
role in learning from limited supervised data for applications like drug design. In most MRL …
role in learning from limited supervised data for applications like drug design. In most MRL …
Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts
The availability of large-scale image captioning and visual question answering datasets has
contributed significantly to recent successes in vision-and-language pre-training. However …
contributed significantly to recent successes in vision-and-language pre-training. However …