Correlational image modeling for self-supervised visual pre-training

X Li, H Yuan, W Li, H Ding, S Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this work we address various segmentation tasks each traditionally tackled by distinct or
partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …

被引用次数：25 相关文章所有 3 个版本

[PDF] arxiv.org

Mimic before reconstruct: Enhancing masked autoencoders with feature mimicking

P Gao, Z Lin, R Zhang, R Fang, H Li, H Li… - International Journal of …, 2024 - Springer

Masked Autoencoders (MAE) have been popular paradigms for large-scale vision
representation pre-training. However, MAE solely reconstructs the low-level RGB signals …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

Pre-training with random orthogonal projection image modeling

M Haghighat, P Moghadam, S Mohamed… - arXiv preprint arXiv …, 2023 - arxiv.org

Masked Image Modeling (MIM) is a powerful self-supervised strategy for visual pre-training
without the use of labels. MIM applies random crops to input images, processes them with …

被引用次数：2 相关文章所有 4 个版本

[PDF] thecvf.com

SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation

K Yin, V Rao, R Jiang, X Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Self-supervised landmark estimation is a challenging task that demands the formation of
locally distinct feature representations to identify sparse facial landmarks in the absence of …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work

Q Wang, Y Yin - arXiv preprint arXiv:2306.01929, 2023 - arxiv.org

Inspired by the fact that human brains can emphasize discriminative parts of the input and
suppress irrelevant ones, substantial local mechanisms have been designed to boost the …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

w2v-SELD: A Sound Event Localization and Detection Framework for Self-Supervised Spatial Audio Pre-Training

OL Santos, K Rosero, RA Lotufo - arXiv preprint arXiv:2312.06907, 2023 - arxiv.org

Sound Event Detection and Localization (SELD) constitutes a complex task that depends on
extensive multichannel audio recordings with annotated sound events and their respective …

被引用次数：1 相关文章所有 3 个版本