[HTML][HTML] Large-scale multi-modal pre-trained models: A comprehensive survey
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
Continual vision-language representation learning with off-diagonal information
Large-scale multi-modal contrastive learning frameworks like CLIP typically require a large
amount of image-text samples for training. However, these samples are always collected …
amount of image-text samples for training. However, these samples are always collected …
[HTML][HTML] Structure-inducing pre-training
MBA McDermott, B Yap, P Szolovits… - Nature Machine …, 2023 - nature.com
Abstract Language model pre-training and the derived general-purpose methods have
reshaped machine learning research. However, there remains considerable uncertainty …
reshaped machine learning research. However, there remains considerable uncertainty …