Deep clustering for unsupervised learning of visual features

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …

被引用次数：409 相关文章所有 2 个版本

Advances, challenges and opportunities in creating data for trustworthy AI

W Liang, GA Tadesse, D Ho, L Fei-Fei… - Nature Machine …, 2022 - nature.com

As artificial intelligence (AI) transitions from research to deployment, creating the appropriate
datasets and data pipelines to develop and evaluate AI models is increasingly the biggest …

被引用次数：227 相关文章所有 3 个版本

[PDF] arxiv.org

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arXiv preprint arXiv …, 2023 - arxiv.org

The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

被引用次数：1004 相关文章所有 11 个版本

[PDF] thecvf.com

Diffusion art or digital forgery? investigating data replication in diffusion models

G Somepalli, V Singla, M Goldblum… - Proceedings of the …, 2023 - openaccess.thecvf.com

Cutting-edge diffusion models produce images with high quality and customizability,
enabling them to be used for commercial art and graphic design purposes. But do diffusion …

被引用次数：191 相关文章所有 6 个版本

[PDF] thecvf.com

Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners

R Zhang, X Hu, B Li, S Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Visual recognition in low-data regimes requires deep neural networks to learn generalized
representations from limited training samples. Recently, CLIP-based methods have shown …

被引用次数：111 相关文章所有 5 个版本

[PDF] thecvf.com

Rethinking semantic segmentation: A prototype view

T Zhou, W Wang, E Konukoglu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Prevalent semantic segmentation solutions, despite their different network designs (FCN
based or attention based) and mask decoding strategies (parametric softmax based or pixel …

被引用次数：256 相关文章所有 10 个版本

[PDF] thecvf.com

Masked feature prediction for self-supervised visual pre-training

C Wei, H Fan, S Xie, CY Wu, A Yuille… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training
of video models. Our approach first randomly masks out a portion of the input sequence and …

被引用次数：582 相关文章所有 6 个版本

[PDF] thecvf.com

Simmim: A simple framework for masked image modeling

Z Xie, Z Zhang, Y Cao, Y Lin, J Bao… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper presents SimMIM, a simple framework for masked image modeling. We have
simplified recently proposed relevant approaches, without the need for special designs …

被引用次数：1111 相关文章所有 6 个版本

[PDF] thecvf.com

Contrastive test-time adaptation

D Chen, D Wang, T Darrell… - Proceedings of the …, 2022 - openaccess.thecvf.com

Test-time adaptation is a special setting of unsupervised domain adaptation where a trained
model on the source domain has to adapt to the target domain without accessing source …

被引用次数：221 相关文章所有 5 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：106 相关文章所有 6 个版本