Large-scale multi-modal pre-trained models: A comprehensive survey
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
Multimodal learning with transformers: A survey
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
Filip: Fine-grained interactive language-image pre-training
Unsupervised large-scale vision-language pre-training has shown promising advances on
various downstream tasks. Existing methods often model the cross-modal interaction either …
various downstream tasks. Existing methods often model the cross-modal interaction either …
Which industrial sectors are affected by artificial intelligence? A bibliometric analysis of trends and Perspectives
L Espina-Romero, JG Noroño Sánchez… - Sustainability, 2023 - mdpi.com
In recent times, artificial intelligence (AI) has been generating a significant impact in various
industry sectors, which implies that companies must be ready to adjust to this promising start …
industry sectors, which implies that companies must be ready to adjust to this promising start …
Wukong: A 100 million large-scale chinese cross-modal pre-training benchmark
Abstract Vision-Language Pre-training (VLP) models have shown remarkable performance
on various downstream tasks. Their success heavily relies on the scale of pre-trained cross …
on various downstream tasks. Their success heavily relies on the scale of pre-trained cross …
Ctp: Towards vision-language continual pretraining via compatible momentum contrast and topology preservation
Abstract Vision-Language Pretraining (VLP) has shown impressive results on diverse
downstream tasks by offline training on large-scale datasets. Regarding the growing nature …
downstream tasks by offline training on large-scale datasets. Regarding the growing nature …
Transformers in speech processing: A survey
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …
sparked the interest of the speech-processing community, leading to an exploration of their …
Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval
Abstract recommendation, and marketing services. Extensive efforts have been made to
conquer the cross-modal retrieval problem in the general domain. When it comes to E …
conquer the cross-modal retrieval problem in the general domain. When it comes to E …
M5product: Self-harmonized contrastive learning for e-commercial multi-modal pretraining
Despite the potential of multi-modal pre-training to learn highly discriminative feature
representations from complementary data modalities, current progress is being slowed by …
representations from complementary data modalities, current progress is being slowed by …
Composed image retrieval using contrastive learning and task-oriented clip-based features
Given a query composed of a reference image and a relative caption, the Composed Image
Retrieval goal is to retrieve images visually similar to the reference one that integrates the …
Retrieval goal is to retrieve images visually similar to the reference one that integrates the …