Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners

R Zhang, X Hu, B Li, S Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Visual recognition in low-data regimes requires deep neural networks to learn generalized
representations from limited training samples. Recently, CLIP-based methods have shown …

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

R Zhang, L Wang, Y Qiao, P Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …

Not all features matter: Enhancing few-shot clip with adaptive prior refinement

X Zhu, R Zhang, B He, A Zhou… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its
application to diverse downstream vision tasks. To improve its capacity on downstream …

Calip: Zero-shot enhancement of clip with parameter-free attention

Z Guo, R Zhang, L Qiu, X Ma, X Miao, X He… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Abstract Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual
representations with promising zero-shot performance. To further improve its downstream …

Joint-mae: 2d-3d joint masked autoencoders for 3d point cloud pre-training

Z Guo, R Zhang, L Qiu, X Li, PA Heng - arXiv preprint arXiv:2302.14007, 2023 - arxiv.org
Masked Autoencoders (MAE) have shown promising performance in self-supervised
learning for both 2D and 3D computer vision. However, existing MAE-style methods can only …

Sparsemae: Sparse training meets masked autoencoders

A Zhou, Y Li, Z Qin, J Liu, J Pan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Masked Autoencoders (MAE) and its variants have proven to be effective for pretraining
large-scale Vision Transformers (ViTs). However, small-scale models do not benefit from the …

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

J Liu, R Xu, S Yang, R Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained
model to continually changing target distributions addressing real-world dynamism. Existing …

Adaptive distribution masked autoencoders for continual test-time adaptation

J Liu, R Xu, S Yang, R Zhang, Q Zhang, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model
to continually changing target distributions, addressing real-world dynamism. Existing CTTA …

[HTML][HTML] Masked Image Modeling Auxiliary Pseudo-Label Propagation with a Clustering Central Rectification Strategy for Cross-Scene Classification

X Zhang, Y Zhuang, T Zhang, C Li, H Chen - Remote Sensing, 2024 - mdpi.com
Cross-scene classification focuses on setting up an effective domain adaptation (DA) way to
transfer the learnable knowledge from source to target domain, which can be reasonably …

Masked angle-aware autoencoder for remote sensing images

Z Li, B Hou, S Ma, Z Wu, X Guo, B Ren… - arXiv preprint arXiv …, 2024 - arxiv.org
To overcome the inherent domain gap between remote sensing (RS) images and natural
images, some self-supervised representation learning methods have made promising …