OMG-Seg: Is one model good enough for all segmentation?
In this work we address various segmentation tasks each traditionally tackled by distinct or
partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …
partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …
Mimic before reconstruct: Enhancing masked autoencoders with feature mimicking
Masked Autoencoders (MAE) have been popular paradigms for large-scale vision
representation pre-training. However, MAE solely reconstructs the low-level RGB signals …
representation pre-training. However, MAE solely reconstructs the low-level RGB signals …
Pre-training with random orthogonal projection image modeling
Masked Image Modeling (MIM) is a powerful self-supervised strategy for visual pre-training
without the use of labels. MIM applies random crops to input images, processes them with …
without the use of labels. MIM applies random crops to input images, processes them with …
SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation
K Yin, V Rao, R Jiang, X Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Self-supervised landmark estimation is a challenging task that demands the formation of
locally distinct feature representations to identify sparse facial landmarks in the absence of …
locally distinct feature representations to identify sparse facial landmarks in the absence of …
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Q Wang, Y Yin - arXiv preprint arXiv:2306.01929, 2023 - arxiv.org
Inspired by the fact that human brains can emphasize discriminative parts of the input and
suppress irrelevant ones, substantial local mechanisms have been designed to boost the …
suppress irrelevant ones, substantial local mechanisms have been designed to boost the …
w2v-SELD: A Sound Event Localization and Detection Framework for Self-Supervised Spatial Audio Pre-Training
Sound Event Detection and Localization (SELD) constitutes a complex task that depends on
extensive multichannel audio recordings with annotated sound events and their respective …
extensive multichannel audio recordings with annotated sound events and their respective …