A light weight model for active speaker detection
Active speaker detection is a challenging task in audio-visual scenarios, with the aim to
detect who is speaking in one or more speaker scenarios. This task has received …
detect who is speaking in one or more speaker scenarios. This task has received …
Egocentric auditory attention localization in conversations
In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …
auditory attention, or the ability to focus on a particular speaker while tuning out others …
Target active speaker detection with audio-visual cues
In active speaker detection (ASD), we would like to detect whether an on-screen person is
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …
Audio-Visual Speaker Diarization: Current Databases, Approaches and Challenges
Nowadays, the large amount of audio-visual content available has fostered the need to
develop new robust automatic speaker diarization systems to analyse and characterise it …
develop new robust automatic speaker diarization systems to analyse and characterise it …
TalkNCE: Improving Active Speaker Detection with Talk-Aware Contrastive Learning
The goal of this work is Active Speaker Detection (ASD), a task to determine whether a
person is speaking or not in a series of video frames. Previous works have dealt with the …
person is speaking or not in a series of video frames. Previous works have dealt with the …
Audio-visual activity guided cross-modal identity association for active speaker detection
R Sharma, S Narayanan - IEEE Open Journal of Signal …, 2023 - ieeexplore.ieee.org
Active speaker detection in videos addresses associating a source face, visible in the video
frames, with the underlying speech in the audio modality. The two primary sources of …
frames, with the underlying speech in the audio modality. The two primary sources of …
A novel framework for multi-person temporal gaze following and social gaze prediction
Gaze following and social gaze prediction are fundamental tasks providing insights into
human communication behaviors, intent, and social interactions. Most previous approaches …
human communication behaviors, intent, and social interactions. Most previous approaches …
Improving audiovisual active speaker detection in egocentric recordings with the data-efficient image transformer
Future augmented reality devices have the capacity to enhance human perception and
provide assistive functions in complex communication scenarios. Active speaker detection …
provide assistive functions in complex communication scenarios. Active speaker detection …
Modeling long-term multimodal representations for active speaker detection with spatio-positional encoder
M Kyoung, HJ Song - IEEE Access, 2023 - ieeexplore.ieee.org
In this study, we present an end-to-end framework for active speaker detection to achieve
robust performance in challenging scenarios with multiple speakers. In contrast to recent …
robust performance in challenging scenarios with multiple speakers. In contrast to recent …
Multiple Information-Aware Recurrent Reasoning Network for Joint Dialogue Act Recognition and Sentiment Classification
S Li, X Chen - Information, 2023 - mdpi.com
The task of joint dialogue act recognition (DAR) and sentiment classification (DSC) aims to
predict both the act and sentiment labels of each utterance in a dialogue. Existing methods …
predict both the act and sentiment labels of each utterance in a dialogue. Existing methods …