Active generalized category discovery
Abstract Generalized Category Discovery (GCD) is a pragmatic and challenging open-world
task which endeavors to cluster unlabeled samples from both novel and old classes …
task which endeavors to cluster unlabeled samples from both novel and old classes …
CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training
Learning joint and coordinated features across modalities is essential for many audio-visual
tasks. Existing pre-training methods primarily focus on global information neglecting fine …
tasks. Existing pre-training methods primarily focus on global information neglecting fine …
CoReS: Orchestrating the Dance of Reasoning and Segmentation
The reasoning segmentation task, which demands a nuanced comprehension of intricate
queries to accurately pinpoint object regions, is attracting increasing attention. However …
queries to accurately pinpoint object regions, is attracting increasing attention. However …
Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization
Audio-Visual Source Localization (AVSL) is the task of identifying specific sounding objects
in the scene given audio cues. In our work, we focus on semi-supervised AVSL with pseudo …
in the scene given audio cues. In our work, we focus on semi-supervised AVSL with pseudo …
Unveiling Visual Biases in Audio-Visual Localization Benchmarks
Audio-Visual Source Localization (AVSL) aims to localize the source of sound within a
video. In this paper, we identify a significant issue in existing benchmarks: the sounding …
video. In this paper, we identify a significant issue in existing benchmarks: the sounding …