Weakly supervised object localization and detection: A survey
As an emerging and challenging problem in the computer vision community, weakly
supervised object localization and detection plays an important role for developing new …
supervised object localization and detection plays an important role for developing new …
Semantic image segmentation: Two decades of research
Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer
vision applications, providing key information for the global understanding of an image. This …
vision applications, providing key information for the global understanding of an image. This …
Multi-class token transformer for weakly supervised semantic segmentation
This paper proposes a new transformer-based framework to learn class-specific object
localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS) …
localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS) …
Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers
Weakly-supervised semantic segmentation (WSSS) with image-level labels is an important
and challenging task. Due to the high training efficiency, end-to-end solutions for WSSS …
and challenging task. Due to the high training efficiency, end-to-end solutions for WSSS …
Token contrast for weakly-supervised semantic segmentation
Abstract Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels
typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the …
typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the …
Conformer: Local features coupling global representations for visual recognition
Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …
at extracting local features but experience difficulty to capture global representations. Within …
Hts-at: A hierarchical token-semantic audio transformer for sound classification and detection
Audio classification is an important task of mapping audio samples into their corresponding
labels. Recently, the transformer model with self-attention mechanisms has been adopted in …
labels. Recently, the transformer model with self-attention mechanisms has been adopted in …
Generative prompt model for weakly supervised object localization
Weakly supervised object localization (WSOL) remains challenging when learning object
localization models from image category labels. Conventional methods that discriminatively …
localization models from image category labels. Conventional methods that discriminatively …
Transmix: Attend to mix for vision transformers
Mixup-based augmentation has been found to be effective for generalizing models during
training, especially for Vision Transformers (ViTs) since they can easily overfit. However …
training, especially for Vision Transformers (ViTs) since they can easily overfit. However …
Clip surgery for better explainability with enhancement in open-vocabulary tasks
Contrastive Language-Image Pre-training (CLIP) is a powerful multimodal large vision
model that has demonstrated significant benefits for downstream tasks, including many zero …
model that has demonstrated significant benefits for downstream tasks, including many zero …