Highly accurate phonetic segmentation using boundary correction models and system fusion

MY Liberman - Annual Review of Linguistics, 2019 - annualreviews.org

Semiautomatic analysis of digital speech collections is transforming the science of
phonetics. Convenient search and analysis of large published bodies of recordings …

被引用次数：47 相关文章所有 2 个版本

[PDF] arxiv.org

Whisperx: Time-accurate speech transcription of long-form audio

M Bain, J Huh, T Han, A Zisserman - arXiv preprint arXiv:2303.00747, 2023 - arxiv.org

Large-scale, weakly-supervised speech recognition models, such as Whisper, have
demonstrated impressive results on speech recognition across domains and languages …

被引用次数：202 相关文章所有 8 个版本

[PDF] academia.edu

[PDF][PDF] A review: Automatic speech segmentation

AE Sakran, SM Abdou, SE Hamid… - International Journal of …, 2017 - academia.edu

Automated segmentation of speech signals has been under research for over 30 years.
Many speech processing systems require segmentation of Speech waveform into principal …

被引用次数：29 相关文章所有 2 个版本

[PDF] arxiv.org

ASR-aware end-to-end neural diarization

A Khare, E Han, Y Yang… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

We present a Conformer-based end-to-end neural diarization (EEND) model that uses both
acoustic input and features derived from an automatic speech recognition (ASR) model. Two …

被引用次数：23 相关文章所有 8 个版本

[PDF] kit.edu

Phoneme boundary detection using deep bidirectional lstms

J Franke, M Mueller, F Hamlaoui… - … ; 12. ITG Symposium, 2016 - ieeexplore.ieee.org

In this paper we investigate the automatic detection of phoneme boundaries in audio
recordings with the help of deep bidirectional LSTMs. This work is motivated by the needs of …

被引用次数：43 相关文章所有 6 个版本

Phoneme mispronunciation detection by jointly learning to align

B Lin, L Wang - … 2022-2022 IEEE International Conference on …, 2022 - ieeexplore.ieee.org

Phoneme mispronunciation detection plays an important role in Computer-Assisted
Pronunciation Training. Traditional methods either rely on phone recognition, which has the …

被引用次数：11 相关文章

[PDF] ed.ac.uk

Phonetic Error Analysis Beyond Phone Error Rate

E Loweimi, A Carmantini, P Bell… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In this article, we analyse the performance of the TIMIT-based phone recognition systems
beyond the overall phone error rate (PER) metric. We consider three broad phonetic classes …

被引用次数：1 相关文章所有 7 个版本

[PDF] nb.rs

A retrieval algorithm of encrypted speech based on syllable-level perceptual hashing

S He, H Zhao - Computer Science and Information Systems, 2017 - doiserbia.nb.rs

To retrieve voice information in a fast and accurate manner over encrypted speech, this
study proposes a retrieval algorithm based on syllable-level perceptual hashing. It …

被引用次数：31 相关文章所有 6 个版本

[PDF] degruyter.com

The Mason-Alberta Phonetic Segmenter: a forced alignment system based on deep neural networks and interpolation

MC Kelley, SJ Perry, BV Tucker - Phonetica, 2024 - degruyter.com

Given an orthographic transcription, forced alignment systems automatically determine
boundaries between segments in speech, facilitating the use of large corpora. In the present …

被引用次数：2 相关文章所有 2 个版本

A zero-resourced indigenous language phones occurrence and durations analysis for an automatic speech recognition system

S Sasmal, Y Saring - International Journal of Information Technology, 2023 - Springer

This research illustrates phone occurrence analysis for an automatic speech recognition
(ASR) model of 'Adi.''Adi'is a low-resourced endangered tribal language of Arunachal …

被引用次数：2 相关文章