Training audio captioning models without audio

S Deshmukh, B Elizalde… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Automated Audio Captioning (AAC) is the task of generating natural language descriptions
given an audio stream. A typical AAC system requires manually curated training data of …

Perceptual–neural–physical sound matching

H Han, V Lostanlen, M Lagrange - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Sound matching algorithms seek to approximate a target waveform by parametric audio
synthesis. Deep neural networks have achieved promising results in matching sustained …

Classifying non-individual head-related transfer functions with a computational auditory model: Calibration and metrics

R Daugintis, R Barumerli, L Picinali… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
This study explores the use of a multi-feature Bayesian auditory sound localisation model to
classify non-individual head-related transfer functions (HRTFs). Based on predicted sound …

Semantically-informed deep neural networks for sound recognition

M Esposito, G Valente… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Deep neural networks (DNNs) for sound recognition learn to categorize a barking sound as
a" dog" and a meowing sound as a" cat" but do not exploit information inherent to the …

An Approach to Ontological Learning from Weak Labels

A Shah, L Tang, PH Chou, YY Zheng… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Ontologies encompass a formal representation of knowledge through the definition of
concepts or properties of a domain, and the relationships between those concepts. In this …

Audio Entailment: Assessing Deductive Reasoning for Audio Understanding

S Deshmukh, S Han, H Bukhari, B Elizalde… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent literature uses language to build foundation models for audio. These Audio-
Language Models (ALMs) are trained on a vast number of audio-text pairs and show …

[PDF][PDF] Steering latent audio models through interactive machine learning

G Vigliensoni, R Fiebrink - 2023 - ualresearchonline.arts.ac.uk
In this paper, we present a proof-of-concept mechanism for steering latent audio models
through interactive machine learning. Our approach involves mapping the human …

Using Machine Learning to Understand the Relationships Between Audiometric Data, Speech Perception, Temporal Processing, And Cognition

RM Khalil, A Papanicolaou, RT Chou… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Aging and hearing loss cause communication difficulties, particularly for speech perception
in demanding situations, which have been associated with factors including cognitive …

[图书][B] Listening: The Key Concepts

ES Parks, MH Faw, LR Lane - 2024 - books.google.com
A vital and comprehensive starting place for understanding the key concepts, this book
explores 177 diverse types and styles of listening named in academic scholarship to date …

Perceptual Analysis of Speaker Embeddings for Voice Discrimination between Machine And Human Listening

I Thoidis, C Gaultier, T Goehring - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
This study investigates the information captured by speaker embeddings with relevance to
human speech perception. A Convolutional Neural Network was trained to perform one-shot …