A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Better together: Dialogue separation and voice activity detection for audio personalization in TV
M Torcoli, EAP Habets - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
In TV services, dialogue level personalization is key to meeting user preferences and needs.
When dialogue and background sounds are not separately available from the production …
When dialogue and background sounds are not separately available from the production …
Dialogue Understandability: Why are we streaming movies with subtitles?
Watching movies and TV shows with subtitles enabled is not simply down to audibility or
speech intelligibility. A variety of evolving factors related to technological advances, cinema …
speech intelligibility. A variety of evolving factors related to technological advances, cinema …
Scaling Up Music Information Retrieval Training with Semi-Supervised Learning
In the era of data-driven Music Information Retrieval (MIR), the scarcity of labeled data has
been one of the major concerns to the success of an MIR task. In this work, we leverage the …
been one of the major concerns to the success of an MIR task. In this work, we leverage the …
Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support
KN Warcharasupat, CW Wu… - 2024 IEEE 5th International …, 2024 - ieeexplore.ieee.org
Cinematic audio source separation (CASS), as a problem of extracting the dialogue, music,
and effects stems from their mixture, is a relatively new subtask of audio source separation …
and effects stems from their mixture, is a relatively new subtask of audio source separation …
How to Design a Cheap Music Detection System Using a Simple Multilayer Perceptron With Temporal Integration
We show how to design a cheap system for detecting when music is present in audio
recordings. We make use of a small neural network consisting of a simple multilayer …
recordings. We make use of a small neural network consisting of a simple multilayer …
Background music monitoring framework and dataset for TV broadcast audio
H Kim, J Kim, J Park, S Kim, C Park, W Yoo - ETRI Journal, 2024 - Wiley Online Library
Music identification is widely regarded as a solved problem for music searching in quiet
environments, but its performance tends to degrade in TV broadcast audio owing to the …
environments, but its performance tends to degrade in TV broadcast audio owing to the …
Speech Data from Radio Broadcasts for Low Resource Languages
BB Odoom, LPG Perera, P Hansanti… - Proceedings of the …, 2024 - aclanthology.org
We created a collection of speech data for 48 low resource languages. The corpus is
extracted from radio broadcasts and processed with novel speech detection and language …
extracted from radio broadcasts and processed with novel speech detection and language …
Zero-Shot Crate Digging: DJ Tool Retrieval Using Speech Activity, Music Structure And CLAP Embeddings
I Orife - arXiv preprint arXiv:2411.12209, 2024 - arxiv.org
In genres like Hip-Hop, RnB, Reggae, Dancehall and just about every
Electronic/Dance/Club style, DJ tools are a special set of audio files curated to heighten the …
Electronic/Dance/Club style, DJ tools are a special set of audio files curated to heighten the …
[HTML][HTML] TV 방송배경음악식별/분리/검출용유사방송음악-대사및큐시트데이터셋
김혜미, 김성우, 이상원, 김정현, 박지현… - Journal of Digital …, 2023 - journal.dcs.or.kr
초록 유사 방송 음악-대사 및 큐시트 데이터셋은 TV 방송에 삽입된 배경음악을 자동으로
식별하는 기술의 성능을 측정하기 위한 데이터셋으로써, 크게 세 가지 요소인 방송 오디오 …
식별하는 기술의 성능을 측정하기 위한 데이터셋으로써, 크게 세 가지 요소인 방송 오디오 …