Audio-Visual Segmentation via Unlabeled Frame Exploitation
Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …
Although great progress has been witnessed we experimentally reveal that current methods …
Prompting segmentation with sound is generalizable audio-visual source localizer
Never having seen an object and heard its sound simultaneously, can the model still
accurately localize its visual position from the input audio? In this work, we concentrate on …
accurately localize its visual position from the input audio? In this work, we concentrate on …
Stepping stones: A progressive training strategy for audio-visual semantic segmentation
Audio-Visual Segmentation (AVS) aims to achieve pixel-level localization of sound sources
in videos, while Audio-Visual Semantic Segmentation (AVSS), as an extension of AVS …
in videos, while Audio-Visual Semantic Segmentation (AVSS), as an extension of AVS …
Turbo: Informativity-driven acceleration plug-in for vision-language models
Vision-Language Large Models (VLMs) have become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …
impressive performance. However, their expensive computation costs, ie, throughput and …
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
C Ju, H Wang, H Cheng, X Chen, Z Zhai… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …
impressive performance. However, their expensive computation costs, ie, throughput and …
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Audio-visual segmentation (AVS) is an emerging task that aims to accurately segment
sounding objects based on audio-visual cues. The success of AVS learning systems …
sounding objects based on audio-visual cues. The success of AVS learning systems …
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Audio-visual segmentation (AVS) is a challenging task that involves accurately segmenting
sounding objects based on audio-visual cues. The effectiveness of audio-visual learning …
sounding objects based on audio-visual cues. The effectiveness of audio-visual learning …
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
How to effectively interact audio with vision has garnered considerable interest within the
multi-modality research field. Recently, a novel audio-visual segmentation (AVS) task has …
multi-modality research field. Recently, a novel audio-visual segmentation (AVS) task has …
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
As one of the fundamental video tasks in computer vision, Open-Vocabulary Action
Recognition (OVAR) recently gains increasing attention, with the development of vision …
Recognition (OVAR) recently gains increasing attention, with the development of vision …
Image to Multi-Modal Retrieval for Industrial Scenarios
We formally define a novel valuable information retrieval task: image-to-multi-modal-retrieval
(IMMR), where the query is an image and the doc is an entity with both image and textual …
(IMMR), where the query is an image and the doc is an entity with both image and textual …