An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models

R Parikh, G Rochette, C Espy-Wilson… - arXiv preprint arXiv …, 2022 - arxiv.org
End-to-end learning models have demonstrated a remarkable capability in performing
speech segregation. Despite their wide-scope of real-world applications, little is known …

Prediction in polyphony: modelling musical auditory scene analysis

SA Sauvé - 2018 - qmro.qmul.ac.uk
How do we know that a melody is a melody? In other words, how does the human brain
extract melody from a polyphonic musical context? This thesis begins with a theoretical …

A physiologically inspired model for solving the cocktail party problem

KF Chou, J Dong, HS Colburn, K Sen - … of the Association for Research in …, 2019 - Springer
At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues
(eg, our names being called, or the fire alarm going off), or selectively listen to a target sound …

A biologically oriented algorithm for spatial sound segregation

KF Chou, AD Boyd, V Best, HS Colburn… - Frontiers in …, 2022 - frontiersin.org
Listening in an acoustically cluttered scene remains a difficult task for both machines and
hearing-impaired listeners. Normal-hearing listeners accomplish this task with relative ease …

Cluster analysis for the separation of auditory scenes

MS Daley, LM Bonacci, DH Gever, K Diaz… - IEEE …, 2021 - ieeexplore.ieee.org
The “cocktail party problem” refers to the ability of human listeners to separate the acoustic
signal reaching their ears into its individual components, corresponding to individual sound …

Speech envelope dynamics for noise-robust auditory scene analysis in robotics

F Rea, A Kothig, L Grasse, M Tata - International Journal Of …, 2020 - World Scientific
Humans make extensive use of auditory cues to interact with other humans, especially in
challenging real-world acoustic environments. Multiple distinct acoustic events usually mix …

Optimality and limitations of audio-visual integration for cognitive systems

WP Boyce, A Lindsay, A Zgonnikov, I Rañó… - Frontiers in Robotics …, 2020 - frontiersin.org
Multimodal integration is an important process in perceptual decision-making. In humans,
this process has often been shown to be statistically optimal, or near optimal: sensory …

Neural signatures of disordered multi-talker speech perception in adults with normal hearing

A Parthasarathy, KE Hancock, K Bennett, V DeGruttola… - bioRxiv, 2019 - biorxiv.org
In social settings, speech waveforms from nearby speakers mix together in our ear canals.
The brain unmixes the attended speech stream from the chorus of background speakers …

Slow and steady: auditory features for discriminating animal vocalizations

RW Di Tullio, L Wei, V Balasubramanian - bioRxiv, 2024 - biorxiv.org
We propose that listeners can use temporal regularities–spectro-temporal correlations that
change smoothly over time–to discriminate animal vocalizations within and between …

Sensitivity of neural responses in the inferior colliculus to statistical features of sound textures

AP Mishra, F Peng, K Li, NS Harper, JWH Schnupp - Hearing Research, 2021 - Elsevier
Previous psychophysical studies have identified a hierarchy of time-averaged statistics
which determine the identity of natural sound textures. However, it is unclear whether the …