Corpus phonetics

MY Liberman - Annual Review of Linguistics, 2019 - annualreviews.org
Semiautomatic analysis of digital speech collections is transforming the science of
phonetics. Convenient search and analysis of large published bodies of recordings …

Whisperx: Time-accurate speech transcription of long-form audio

M Bain, J Huh, T Han, A Zisserman - arXiv preprint arXiv:2303.00747, 2023 - arxiv.org
Large-scale, weakly-supervised speech recognition models, such as Whisper, have
demonstrated impressive results on speech recognition across domains and languages …

[PDF][PDF] A review: Automatic speech segmentation

AE Sakran, SM Abdou, SE Hamid… - International Journal of …, 2017 - academia.edu
Automated segmentation of speech signals has been under research for over 30 years.
Many speech processing systems require segmentation of Speech waveform into principal …

ASR-aware end-to-end neural diarization

A Khare, E Han, Y Yang… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
We present a Conformer-based end-to-end neural diarization (EEND) model that uses both
acoustic input and features derived from an automatic speech recognition (ASR) model. Two …

Phoneme boundary detection using deep bidirectional lstms

J Franke, M Mueller, F Hamlaoui… - … ; 12. ITG Symposium, 2016 - ieeexplore.ieee.org
In this paper we investigate the automatic detection of phoneme boundaries in audio
recordings with the help of deep bidirectional LSTMs. This work is motivated by the needs of …

Phoneme mispronunciation detection by jointly learning to align

B Lin, L Wang - … 2022-2022 IEEE International Conference on …, 2022 - ieeexplore.ieee.org
Phoneme mispronunciation detection plays an important role in Computer-Assisted
Pronunciation Training. Traditional methods either rely on phone recognition, which has the …

Phonetic Error Analysis Beyond Phone Error Rate

E Loweimi, A Carmantini, P Bell… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In this article, we analyse the performance of the TIMIT-based phone recognition systems
beyond the overall phone error rate (PER) metric. We consider three broad phonetic classes …

A retrieval algorithm of encrypted speech based on syllable-level perceptual hashing

S He, H Zhao - Computer Science and Information Systems, 2017 - doiserbia.nb.rs
To retrieve voice information in a fast and accurate manner over encrypted speech, this
study proposes a retrieval algorithm based on syllable-level perceptual hashing. It …

The Mason-Alberta Phonetic Segmenter: a forced alignment system based on deep neural networks and interpolation

MC Kelley, SJ Perry, BV Tucker - Phonetica, 2024 - degruyter.com
Given an orthographic transcription, forced alignment systems automatically determine
boundaries between segments in speech, facilitating the use of large corpora. In the present …

A zero-resourced indigenous language phones occurrence and durations analysis for an automatic speech recognition system

S Sasmal, Y Saring - International Journal of Information Technology, 2023 - Springer
This research illustrates phone occurrence analysis for an automatic speech recognition
(ASR) model of 'Adi.''Adi'is a low-resourced endangered tribal language of Arunachal …