Pengi: An audio language model for audio tasks

S Deshmukh, B Elizalde, R Singh… - Advances in Neural …, 2023 - proceedings.neurips.cc
In the domain of audio processing, Transfer Learning has facilitated the rise of Self-
Supervised Learning and Zero-Shot Learning techniques. These approaches have led to …

Pvass-mdd: predictive visual-audio alignment self-supervision for multimodal deepfake detection

Y Yu, X Liu, R Ni, S Yang, Y Zhao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Deepfake techniques can forge the visual or audio signals in the video, which leads to
inconsistencies between visual and audio (VA) signals. Therefore, multimodal detection …

Self-labeling with feature transfer for speech emotion recognition

G Wen, H Liao, H Li, P Wen, T Zhang, S Gao… - Knowledge-Based …, 2022 - Elsevier
Most speech emotion recognition methods based on frames have obtained good results in
many applications. However, they segment each speech sample into smaller frames that are …

Interpreting glottal flow dynamics for detecting covid-19 from voice

S Deshmukh, M Al Ismail… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
In the pathogenesis of COVID-19, impairment of respiratory functions is often one of the key
symptoms. Studies show that in these cases, voice production is also adversely affected …

A multi-modal wildfire prediction and early-warning system based on a novel machine learning framework

RT Bhowmik, YS Jung, JA Aguilera, M Prunicki… - Journal of environmental …, 2023 - Elsevier
Wildfires are increasingly impacting the environment and human health. Among the top 20
California wildfires, those in 2020–2021 burned more acres than the last century combined …

Audio retrieval with wavtext5k and clap training

S Deshmukh, B Elizalde, H Wang - arXiv preprint arXiv:2209.14275, 2022 - arxiv.org
Audio-Text retrieval takes a natural language query to retrieve relevant audio files in a
database. Conversely, Text-Audio retrieval takes an audio file as a query to retrieve relevant …

[HTML][HTML] A personalized respiratory disease exacerbation prediction technique based on a novel spatio-temporal machine learning architecture and local …

RT Bhowmik, SP Most - Electronics, 2022 - mdpi.com
Chronic respiratory diseases, such as the Chronic Obstructive Pulmonary Disease (COPD)
and asthma, are a serious health crisis, affecting a large number of people globally and …

[HTML][HTML] Weakly supervised u-net with limited upsampling for sound event detection

S Lee, H Kim, GJ Jang - Applied Sciences, 2023 - mdpi.com
Featured Application Audio classification; music information retrieval; audio scene
characterization; temporal localization of sound sources; audio indexing; audio surveillance …

Sound event detection guided by semantic contexts of scenes

N Tonami, K Imoto, R Nagase… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Some studies have revealed that contexts of scenes (eg," home,"" office," and" cooking") are
advantageous for sound event detection (SED). Mobile devices and sensing technologies …

A multi-modal wildfire prediction and personalized early-warning system based on a novel machine learning framework

RT Bhowmik - arXiv preprint arXiv:2208.09079, 2022 - arxiv.org
Wildfires are increasingly impacting the environment, human health and safety. Among the
top 20 California wildfires, those in 2020-2021 burned more acres than the last century …