- 学术资源搜索

A light weight model for active speaker detection

J Liao, H Duan, K Feng, W Zhao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Active speaker detection is a challenging task in audio-visual scenarios, with the aim to
detect who is speaking in one or more speaker scenarios. This task has received …

被引用次数：23 相关文章所有 8 个版本

[PDF] thecvf.com

Egocentric auditory attention localization in conversations

F Ryan, H Jiang, A Shukla… - Proceedings of the …, 2023 - openaccess.thecvf.com

In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …

被引用次数：13 相关文章所有 7 个版本

[PDF] arxiv.org

Target active speaker detection with audio-visual cues

Y Jiang, R Tao, Z Pan, H Li - arXiv preprint arXiv:2305.12831, 2023 - arxiv.org

In active speaker detection (ASD), we would like to detect whether an on-screen person is
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …

被引用次数：14 相关文章所有 6 个版本

[PDF] arxiv.org

Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges

V Mingote, A Ortega, A Miguel, E Lleida - arXiv preprint arXiv:2409.05659, 2024 - arxiv.org

Nowadays, the large amount of audio-visual content available has fostered the need to
develop new robust automatic speaker diarization systems to analyse and characterise it …

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

C Jung, S Lee, K Nam, K Rho, YJ Kim… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

The goal of this work is Active Speaker Detection (ASD), a task to determine whether a
person is speaking or not in a series of video frames. Previous works have dealt with the …

被引用次数：4 相关文章所有 5 个版本

[PDF] ieee.org

Audio-visual activity guided cross-modal identity association for active speaker detection

R Sharma, S Narayanan - IEEE Open Journal of Signal …, 2023 - ieeexplore.ieee.org

Active speaker detection in videos addresses associating a source face, visible in the video
frames, with the underlying speech in the audio modality. The two primary sources of …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

A novel framework for multi-person temporal gaze following and social gaze prediction

A Gupta, S Tafasca, A Farkhondeh, P Vuillecard… - arXiv preprint arXiv …, 2024 - arxiv.org

Gaze following and social gaze prediction are fundamental tasks providing insights into
human communication behaviors, intent, and social interactions. Most previous approaches …

被引用次数：2 相关文章所有 2 个版本

[PDF] whiterose.ac.uk

Improving audiovisual active speaker detection in egocentric recordings with the data-efficient image transformer

J Clarke, Y Gotoh, S Goetze - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org

Future augmented reality devices have the capacity to enhance human perception and
provide assistive functions in complex communication scenarios. Active speaker detection …

被引用次数：1 相关文章所有 3 个版本

[PDF] ieee.org

Modeling long-term multimodal representations for active speaker detection with spatio-positional encoder

M Kyoung, HJ Song - IEEE Access, 2023 - ieeexplore.ieee.org

In this study, we present an end-to-end framework for active speaker detection to achieve
robust performance in challenging scenarios with multiple speakers. In contrast to recent …

被引用次数：1 相关文章所有 2 个版本

[PDF] mdpi.com

Multiple Information-Aware Recurrent Reasoning Network for Joint Dialogue Act Recognition and Sentiment Classification

S Li, X Chen - Information, 2023 - mdpi.com

The task of joint dialogue act recognition (DAR) and sentiment classification (DSC) aims to
predict both the act and sentiment labels of each utterance in a dialogue. Existing methods …

被引用次数：1 相关文章所有 3 个版本