Easycom: An augmented reality dataset to support algorithms for easy communication in noisy...

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

被引用次数：657 相关文章所有 13 个版本

[PDF] arxiv.org

Robust self-supervised audio-visual speech recognition

B Shi, WN Hsu, A Mohamed - arXiv preprint arXiv:2201.01763, 2022 - arxiv.org

Audio-based automatic speech recognition (ASR) degrades significantly in noisy
environments and is particularly vulnerable to interfering speech, as the model cannot …

被引用次数：96 相关文章所有 5 个版本

[PDF] thecvf.com

Egocentric audio-visual object localization

C Huang, Y Tian, A Kumar… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Humans naturally perceive surrounding scenes by unifying sound and sight in a first-person
view. Likewise, machines are advanced to approach human intelligence by learning with …

被引用次数：16 相关文章所有 5 个版本

[HTML] springer.com

[HTML][HTML] An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer

What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

被引用次数：14 相关文章所有 7 个版本

[PDF] thecvf.com

Egocentric deep multi-channel audio-visual active speaker localization

H Jiang, C Murdock, VK Ithapu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Augmented reality devices have the potential to enhance human perception and enable
other assistive functionalities in complex conversational environments. Effectively capturing …

被引用次数：32 相关文章所有 6 个版本

[PDF] ieee.org

Parametric ambisonic encoding of arbitrary microphone arrays

L McCormack, A Politis, R Gonzalez… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org

This article proposes a parametric signal-dependent method for the task of encoding
microphone array signals into Ambisonic signals. The proposed method is presented and …

被引用次数：29 相关文章所有 12 个版本

[PDF] thecvf.com

Egocentric auditory attention localization in conversations

F Ryan, H Jiang, A Shukla… - Proceedings of the …, 2023 - openaccess.thecvf.com

In a noisy conversation environment such as a dinner party, people often exhibit selective
auditory attention, or the ability to focus on a particular speaker while tuning out others …

被引用次数：8 相关文章所有 7 个版本

[PDF] thecvf.com

Revise: Self-supervised speech resynthesis with visual input for universal and generalized speech regeneration

WN Hsu, T Remez, B Shi… - Proceedings of the …, 2023 - openaccess.thecvf.com

Prior works on improving speech quality with visual input typically study each type of
auditory distortion separately (eg, separation, inpainting, video-to-speech) and present …

被引用次数：7 相关文章所有 3 个版本

[HTML] sagepub.com Full View

Sound source selection based on head movements in natural group conversation

H Lu, WO Brimijoin - Trends in Hearing, 2022 - journals.sagepub.com

To optimally improve signal-to-noise ratio in noisy environments, a hearing assistance
device must correctly identify what is signal and what is noise. Many of the biosignal-based …

被引用次数：7 相关文章所有 8 个版本

An introduction to the speech enhancement for augmented reality (spear) challenge

P Guiraud, S Hafezi, PA Naylor… - … on Acoustic Signal …, 2022 - ieeexplore.ieee.org

It is well known that microphone arrays can be used to enhance a target speaker in a noisy,
reverberant environment, with both spatial (eg beamforming) and statistical (eg source …

被引用次数：14 相关文章