Active generalized category discovery

S Ma, F Zhu, Z Zhong, XY Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Generalized Category Discovery (GCD) is a pragmatic and challenging open-world
task which endeavors to cluster unlabeled samples from both novel and old classes …

CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training

Y Guo, S Sun, S Ma, K Zheng, X Bao… - Proceedings of the …, 2024 - openaccess.thecvf.com
Learning joint and coordinated features across modalities is essential for many audio-visual
tasks. Existing pre-training methods primarily focus on global information neglecting fine …

CoReS: Orchestrating the Dance of Reasoning and Segmentation

X Bao, S Sun, S Ma, K Zheng, Y Guo, G Zhao… - … on Computer Vision, 2025 - Springer
The reasoning segmentation task, which demands a nuanced comprehension of intricate
queries to accurately pinpoint object regions, is attracting increasing attention. However …

Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization

Y Guo, S Ma, Y Zhao, H Su… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Audio-Visual Source Localization (AVSL) is the task of identifying specific sounding objects
in the scene given audio cues. In our work, we focus on semi-supervised AVSL with pseudo …

Unveiling Visual Biases in Audio-Visual Localization Benchmarks

L Chen, Z Yue, B Xu, Q Jin - arXiv preprint arXiv:2409.06709, 2024 - arxiv.org
Audio-Visual Source Localization (AVSL) aims to localize the source of sound within a
video. In this paper, we identify a significant issue in existing benchmarks: the sounding …