Pengi: An audio language model for audio tasks
S Deshmukh, B Elizalde, R Singh… - Advances in Neural …, 2023 - proceedings.neurips.cc
In the domain of audio processing, Transfer Learning has facilitated the rise of Self-
Supervised Learning and Zero-Shot Learning techniques. These approaches have led to …
Supervised Learning and Zero-Shot Learning techniques. These approaches have led to …
Pvass-mdd: predictive visual-audio alignment self-supervision for multimodal deepfake detection
Deepfake techniques can forge the visual or audio signals in the video, which leads to
inconsistencies between visual and audio (VA) signals. Therefore, multimodal detection …
inconsistencies between visual and audio (VA) signals. Therefore, multimodal detection …
Self-labeling with feature transfer for speech emotion recognition
Most speech emotion recognition methods based on frames have obtained good results in
many applications. However, they segment each speech sample into smaller frames that are …
many applications. However, they segment each speech sample into smaller frames that are …
Interpreting glottal flow dynamics for detecting covid-19 from voice
S Deshmukh, M Al Ismail… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
In the pathogenesis of COVID-19, impairment of respiratory functions is often one of the key
symptoms. Studies show that in these cases, voice production is also adversely affected …
symptoms. Studies show that in these cases, voice production is also adversely affected …
A multi-modal wildfire prediction and early-warning system based on a novel machine learning framework
Wildfires are increasingly impacting the environment and human health. Among the top 20
California wildfires, those in 2020–2021 burned more acres than the last century combined …
California wildfires, those in 2020–2021 burned more acres than the last century combined …
Audio retrieval with wavtext5k and clap training
Audio-Text retrieval takes a natural language query to retrieve relevant audio files in a
database. Conversely, Text-Audio retrieval takes an audio file as a query to retrieve relevant …
database. Conversely, Text-Audio retrieval takes an audio file as a query to retrieve relevant …
[HTML][HTML] A personalized respiratory disease exacerbation prediction technique based on a novel spatio-temporal machine learning architecture and local …
RT Bhowmik, SP Most - Electronics, 2022 - mdpi.com
Chronic respiratory diseases, such as the Chronic Obstructive Pulmonary Disease (COPD)
and asthma, are a serious health crisis, affecting a large number of people globally and …
and asthma, are a serious health crisis, affecting a large number of people globally and …
[HTML][HTML] Weakly supervised u-net with limited upsampling for sound event detection
Featured Application Audio classification; music information retrieval; audio scene
characterization; temporal localization of sound sources; audio indexing; audio surveillance …
characterization; temporal localization of sound sources; audio indexing; audio surveillance …
Sound event detection guided by semantic contexts of scenes
Some studies have revealed that contexts of scenes (eg," home,"" office," and" cooking") are
advantageous for sound event detection (SED). Mobile devices and sensing technologies …
advantageous for sound event detection (SED). Mobile devices and sensing technologies …
A multi-modal wildfire prediction and personalized early-warning system based on a novel machine learning framework
RT Bhowmik - arXiv preprint arXiv:2208.09079, 2022 - arxiv.org
Wildfires are increasingly impacting the environment, human health and safety. Among the
top 20 California wildfires, those in 2020-2021 burned more acres than the last century …
top 20 California wildfires, those in 2020-2021 burned more acres than the last century …