Audio-aware query-enhanced transformer for audio-visual segmentation

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com

Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …

被引用次数：2 相关文章所有 4 个版本

[PDF] aaai.org

Prompting segmentation with sound is generalizable audio-visual source localizer

Y Wang, W Liu, G Li, J Ding, D Hu, X Li - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Never having seen an object and heard its sound simultaneously, can the model still
accurately localize its visual position from the input audio? In this work, we concentrate on …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Stepping stones: A progressive training strategy for audio-visual semantic segmentation

J Ma, P Sun, Y Wang, D Hu - arXiv preprint arXiv:2407.11820, 2024 - arxiv.org

Audio-Visual Segmentation (AVS) aims to achieve pixel-level localization of sound sources
in videos, while Audio-Visual Semantic Segmentation (AVSS), as an extension of AVS …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Turbo: Informativity-driven acceleration plug-in for vision-language models

C Ju, H Wang, Z Li, X Chen, Z Zhai, W Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

Vision-Language Large Models (VLMs) have become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

C Ju, H Wang, H Cheng, X Chen, Z Zhai… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …

CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Y Chen, C Wang, Y Liu, H Wang, G Carneiro - arXiv preprint arXiv …, 2024 - arxiv.org

Audio-visual segmentation (AVS) is an emerging task that aims to accurately segment
sounding objects based on audio-visual cues. The success of AVS learning systems …

Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation

Y Chen, Y Liu, H Wang, F Liu, C Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Audio-visual segmentation (AVS) is a challenging task that involves accurately segmenting
sounding objects based on audio-visual cues. The effectiveness of audio-visual learning …

Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues

T Chen, Z Tan, T Gong, Q Chu, Y Wu, B Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

How to effectively interact audio with vision has garnered considerable interest within the
multi-modality research field. Recently, a novel audio-visual segmentation (AVS) task has …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition

H Cheng, C Ju, H Wang, J Liu, M Chen, Q Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

As one of the fundamental video tasks in computer vision, Open-Vocabulary Action
Recognition (OVAR) recently gains increasing attention, with the development of vision …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Image to Multi-Modal Retrieval for Industrial Scenarios

Z Cheng, C Ju, X Chen, Z Zhai, S Xiao, X Zeng… - arXiv preprint arXiv …, 2023 - arxiv.org

We formally define a novel valuable information retrieval task: image-to-multi-modal-retrieval
(IMMR), where the query is an image and the doc is an entity with both image and textual …

被引用次数：1 相关文章所有 2 个版本