Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners
Visual recognition in low-data regimes requires deep neural networks to learn generalized
representations from limited training samples. Recently, CLIP-based methods have shown …
representations from limited training samples. Recently, CLIP-based methods have shown …
Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders
Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …
Not all features matter: Enhancing few-shot clip with adaptive prior refinement
Abstract The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its
application to diverse downstream vision tasks. To improve its capacity on downstream …
application to diverse downstream vision tasks. To improve its capacity on downstream …
Calip: Zero-shot enhancement of clip with parameter-free attention
Abstract Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual
representations with promising zero-shot performance. To further improve its downstream …
representations with promising zero-shot performance. To further improve its downstream …
Joint-mae: 2d-3d joint masked autoencoders for 3d point cloud pre-training
Masked Autoencoders (MAE) have shown promising performance in self-supervised
learning for both 2D and 3D computer vision. However, existing MAE-style methods can only …
learning for both 2D and 3D computer vision. However, existing MAE-style methods can only …
Sparsemae: Sparse training meets masked autoencoders
Masked Autoencoders (MAE) and its variants have proven to be effective for pretraining
large-scale Vision Transformers (ViTs). However, small-scale models do not benefit from the …
large-scale Vision Transformers (ViTs). However, small-scale models do not benefit from the …
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Abstract Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained
model to continually changing target distributions addressing real-world dynamism. Existing …
model to continually changing target distributions addressing real-world dynamism. Existing …
Adaptive distribution masked autoencoders for continual test-time adaptation
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model
to continually changing target distributions, addressing real-world dynamism. Existing CTTA …
to continually changing target distributions, addressing real-world dynamism. Existing CTTA …
[HTML][HTML] Masked Image Modeling Auxiliary Pseudo-Label Propagation with a Clustering Central Rectification Strategy for Cross-Scene Classification
X Zhang, Y Zhuang, T Zhang, C Li, H Chen - Remote Sensing, 2024 - mdpi.com
Cross-scene classification focuses on setting up an effective domain adaptation (DA) way to
transfer the learnable knowledge from source to target domain, which can be reasonably …
transfer the learnable knowledge from source to target domain, which can be reasonably …
Masked angle-aware autoencoder for remote sensing images
To overcome the inherent domain gap between remote sensing (RS) images and natural
images, some self-supervised representation learning methods have made promising …
images, some self-supervised representation learning methods have made promising …