- 学术资源搜索

Vision transformers need registers

T Darcet, M Oquab, J Mairal, P Bojanowski - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers have recently emerged as a powerful tool for learning visual representations.
In this paper, we identify and characterize artifacts in feature maps of both supervised and …

被引用次数：225 相关文章所有 10 个版本

[PDF] thecvf.com

Separating the" Chirp" from the" Chat": Self-supervised Visual Grounding of Sound and Language

M Hamilton, A Zisserman… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present DenseAV a novel dual encoder grounding architecture that learns high-
resolution semantically meaningful and audio-visual aligned features solely through …

被引用次数：4 相关文章所有 5 个版本

[PDF] thecvf.com

On Train-Test Class Overlap and Detection for Image Retrieval

CH Song, J Yoon, T Hwang, S Choi… - Proceedings of the …, 2024 - openaccess.thecvf.com

How important is it for training and evaluation sets to not have class overlap in image
retrieval? We revisit Google Landmarks v2 clean the most popular training set by identifying …

被引用次数：3 相关文章所有 3 个版本

[PDF] thecvf.com

ULTRON: Unifying Local Transformer and Convolution for Large-scale Image Retrieval

M Kweon, J Park - Proceedings of the Asian Conference on …, 2024 - openaccess.thecvf.com

In large-scale image retrieval, the primary goal is to extract discriminative features and
embed them into global image representations. Previous methods based on CNNs …

Occlusion-Aware Seamless Segmentation

Y Cao, J Zhang, H Shi, K Peng, Y Zhang… - arXiv preprint arXiv …, 2024 - Springer

Panoramic images can broaden the Field of View (FoV), occlusion-aware prediction can
deepen the understanding of the scene, and domain adaptation can transfer across viewing …

A research for sound event localization and detection based on local–global adaptive fusion and temporal importance network

D Shi, M Guo, M Ma - Multimedia Systems, 2024 - Springer

Sound event localization and detection systems can provide intelligent sound processing
and analysis functions for various application devices. However, existing deep learning …