A light weight model for active speaker detection

J Liao, H Duan, K Feng, W Zhao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Active speaker detection is a challenging task in audio-visual scenarios, with the aim to
detect who is speaking in one or more speaker scenarios. This task has received …

Egocentric auditory attention localization in conversations

F Ryan, H Jiang, A Shukla… - Proceedings of the …, 2023 - openaccess.thecvf.com
In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …

Target active speaker detection with audio-visual cues

Y Jiang, R Tao, Z Pan, H Li - arXiv preprint arXiv:2305.12831, 2023 - arxiv.org
In active speaker detection (ASD), we would like to detect whether an on-screen person is
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …

Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges

V Mingote, A Ortega, A Miguel, E Lleida - arXiv preprint arXiv:2409.05659, 2024 - arxiv.org
Nowadays, the large amount of audio-visual content available has fostered the need to
develop new robust automatic speaker diarization systems to analyse and characterise it …

TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning

C Jung, S Lee, K Nam, K Rho, YJ Kim… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
The goal of this work is Active Speaker Detection (ASD), a task to determine whether a
person is speaking or not in a series of video frames. Previous works have dealt with the …

Audio-visual activity guided cross-modal identity association for active speaker detection

R Sharma, S Narayanan - IEEE Open Journal of Signal …, 2023 - ieeexplore.ieee.org
Active speaker detection in videos addresses associating a source face, visible in the video
frames, with the underlying speech in the audio modality. The two primary sources of …

A novel framework for multi-person temporal gaze following and social gaze prediction

A Gupta, S Tafasca, A Farkhondeh, P Vuillecard… - arXiv preprint arXiv …, 2024 - arxiv.org
Gaze following and social gaze prediction are fundamental tasks providing insights into
human communication behaviors, intent, and social interactions. Most previous approaches …

Improving audiovisual active speaker detection in egocentric recordings with the data-efficient image transformer

J Clarke, Y Gotoh, S Goetze - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
Future augmented reality devices have the capacity to enhance human perception and
provide assistive functions in complex communication scenarios. Active speaker detection …

Modeling long-term multimodal representations for active speaker detection with spatio-positional encoder

M Kyoung, HJ Song - IEEE Access, 2023 - ieeexplore.ieee.org
In this study, we present an end-to-end framework for active speaker detection to achieve
robust performance in challenging scenarios with multiple speakers. In contrast to recent …

Multiple Information-Aware Recurrent Reasoning Network for Joint Dialogue Act Recognition and Sentiment Classification

S Li, X Chen - Information, 2023 - mdpi.com
The task of joint dialogue act recognition (DAR) and sentiment classification (DSC) aims to
predict both the act and sentiment labels of each utterance in a dialogue. Existing methods …