Ts-cam: Token semantic coupled attention map for weakly supervised object localization

D Zhang, J Han, G Cheng… - IEEE transactions on …, 2021 - ieeexplore.ieee.org

As an emerging and challenging problem in the computer vision community, weakly
supervised object localization and detection plays an important role for developing new …

被引用次数：311 相关文章所有 9 个版本

[PDF] nowpublishers.com

Semantic image segmentation: Two decades of research

G Csurka, R Volpi, B Chidlovskii - Foundations and Trends® …, 2022 - nowpublishers.com

Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer
vision applications, providing key information for the global understanding of an image. This …

被引用次数：38 相关文章所有 7 个版本

[PDF] thecvf.com

Multi-class token transformer for weakly supervised semantic segmentation

L Xu, W Ouyang, M Bennamoun… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper proposes a new transformer-based framework to learn class-specific object
localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS) …

被引用次数：212 相关文章所有 7 个版本

[PDF] thecvf.com

Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers

L Ru, Y Zhan, B Yu, B Du - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com

Weakly-supervised semantic segmentation (WSSS) with image-level labels is an important
and challenging task. Due to the high training efficiency, end-to-end solutions for WSSS …

被引用次数：201 相关文章所有 5 个版本

[PDF] thecvf.com

Token contrast for weakly-supervised semantic segmentation

L Ru, H Zheng, Y Zhan, B Du - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels
typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the …

被引用次数：92 相关文章所有 8 个版本

[PDF] thecvf.com

Conformer: Local features coupling global representations for visual recognition

Z Peng, W Huang, S Gu, L Xie… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …

被引用次数：695 相关文章所有 14 个版本

[PDF] arxiv.org

Hts-at: A hierarchical token-semantic audio transformer for sound classification and detection

K Chen, X Du, B Zhu, Z Ma… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Audio classification is an important task of mapping audio samples into their corresponding
labels. Recently, the transformer model with self-attention mechanisms has been adopted in …

被引用次数：199 相关文章所有 8 个版本

[PDF] thecvf.com

Generative prompt model for weakly supervised object localization

Y Zhao, Q Ye, W Wu, C Shen… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Weakly supervised object localization (WSOL) remains challenging when learning object
localization models from image category labels. Conventional methods that discriminatively …

被引用次数：23 相关文章所有 7 个版本

[PDF] thecvf.com

Transmix: Attend to mix for vision transformers

JN Chen, S Sun, J He, PHS Torr… - Proceedings of the …, 2022 - openaccess.thecvf.com

Mixup-based augmentation has been found to be effective for generalizing models during
training, especially for Vision Transformers (ViTs) since they can easily overfit. However …

被引用次数：107 相关文章所有 7 个版本

[PDF] arxiv.org

Clip surgery for better explainability with enhancement in open-vocabulary tasks

Y Li, H Wang, Y Duan, X Li - arXiv preprint arXiv:2304.05653, 2023 - arxiv.org

Contrastive Language-Image Pre-training (CLIP) is a powerful multimodal large vision
model that has demonstrated significant benefits for downstream tasks, including many zero …

被引用次数：71 相关文章所有 2 个版本