Ego4d: Around the world in 3,000 hours of egocentric video
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
Robust self-supervised audio-visual speech recognition
Audio-based automatic speech recognition (ASR) degrades significantly in noisy
environments and is particularly vulnerable to interfering speech, as the model cannot …
environments and is particularly vulnerable to interfering speech, as the model cannot …
Egocentric audio-visual object localization
Humans naturally perceive surrounding scenes by unifying sound and sight in a first-person
view. Likewise, machines are advanced to approach human intelligence by learning with …
view. Likewise, machines are advanced to approach human intelligence by learning with …
[HTML][HTML] An outlook into the future of egocentric vision
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …
research in egocentric vision and the ever-anticipated future, where wearable computing …
Egocentric deep multi-channel audio-visual active speaker localization
Augmented reality devices have the potential to enhance human perception and enable
other assistive functionalities in complex conversational environments. Effectively capturing …
other assistive functionalities in complex conversational environments. Effectively capturing …
Parametric ambisonic encoding of arbitrary microphone arrays
L McCormack, A Politis, R Gonzalez… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
This article proposes a parametric signal-dependent method for the task of encoding
microphone array signals into Ambisonic signals. The proposed method is presented and …
microphone array signals into Ambisonic signals. The proposed method is presented and …
Egocentric auditory attention localization in conversations
In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …
auditory attention, or the ability to focus on a particular speaker while tuning out others …
Revise: Self-supervised speech resynthesis with visual input for universal and generalized speech regeneration
Prior works on improving speech quality with visual input typically study each type of
auditory distortion separately (eg, separation, inpainting, video-to-speech) and present …
auditory distortion separately (eg, separation, inpainting, video-to-speech) and present …
Sound source selection based on head movements in natural group conversation
H Lu, WO Brimijoin - Trends in Hearing, 2022 - journals.sagepub.com
To optimally improve signal-to-noise ratio in noisy environments, a hearing assistance
device must correctly identify what is signal and what is noise. Many of the biosignal-based …
device must correctly identify what is signal and what is noise. Many of the biosignal-based …
An introduction to the speech enhancement for augmented reality (spear) challenge
It is well known that microphone arrays can be used to enhance a target speaker in a noisy,
reverberant environment, with both spatial (eg beamforming) and statistical (eg source …
reverberant environment, with both spatial (eg beamforming) and statistical (eg source …