A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Better together: Dialogue separation and voice activity detection for audio personalization in TV

M Torcoli, EAP Habets - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
In TV services, dialogue level personalization is key to meeting user preferences and needs.
When dialogue and background sounds are not separately available from the production …

Dialogue Understandability: Why are we streaming movies with subtitles?

HB Martinez, A Ragano, D Debnath, A Ullah… - arXiv preprint arXiv …, 2024 - arxiv.org
Watching movies and TV shows with subtitles enabled is not simply down to audibility or
speech intelligibility. A variety of evolving factors related to technological advances, cinema …

Scaling Up Music Information Retrieval Training with Semi-Supervised Learning

YN Hung, JC Wang, M Won, D Le - arXiv preprint arXiv:2310.01353, 2023 - arxiv.org
In the era of data-driven Music Information Retrieval (MIR), the scarcity of labeled data has
been one of the major concerns to the success of an MIR task. In this work, we leverage the …

Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support

KN Warcharasupat, CW Wu… - 2024 IEEE 5th International …, 2024 - ieeexplore.ieee.org
Cinematic audio source separation (CASS), as a problem of extracting the dialogue, music,
and effects stems from their mixture, is a relatively new subtask of audio source separation …

How to Design a Cheap Music Detection System Using a Simple Multilayer Perceptron With Temporal Integration

Z Rafii, E Wold, R Boulderstone - IEEE Signal Processing …, 2024 - ieeexplore.ieee.org
We show how to design a cheap system for detecting when music is present in audio
recordings. We make use of a small neural network consisting of a simple multilayer …

Background music monitoring framework and dataset for TV broadcast audio

H Kim, J Kim, J Park, S Kim, C Park, W Yoo - ETRI Journal, 2024 - Wiley Online Library
Music identification is widely regarded as a solved problem for music searching in quiet
environments, but its performance tends to degrade in TV broadcast audio owing to the …

Speech Data from Radio Broadcasts for Low Resource Languages

BB Odoom, LPG Perera, P Hansanti… - Proceedings of the …, 2024 - aclanthology.org
We created a collection of speech data for 48 low resource languages. The corpus is
extracted from radio broadcasts and processed with novel speech detection and language …

Zero-Shot Crate Digging: DJ Tool Retrieval Using Speech Activity, Music Structure And CLAP Embeddings

I Orife - arXiv preprint arXiv:2411.12209, 2024 - arxiv.org
In genres like Hip-Hop, RnB, Reggae, Dancehall and just about every
Electronic/Dance/Club style, DJ tools are a special set of audio files curated to heighten the …

[HTML][HTML] TV 방송배경음악식별/분리/검출용유사방송음악-대사및큐시트데이터셋

김혜미, 김성우, 이상원, 김정현, 박지현… - Journal of Digital …, 2023 - journal.dcs.or.kr
초록 유사 방송 음악-대사 및 큐시트 데이터셋은 TV 방송에 삽입된 배경음악을 자동으로
식별하는 기술의 성능을 측정하기 위한 데이터셋으로써, 크게 세 가지 요소인 방송 오디오 …