Dialect-to-standard normalization: A large-scale multilingual evaluation
O Kuparinen, AM Haddad… - Conference on Empirical …, 2023 - researchportal.helsinki.fi
Text normalization methods have been commonly applied to historical language or user-
generated content, but less often to dialectal transcriptions. In this paper, we introduce …
generated content, but less often to dialectal transcriptions. In this paper, we introduce …
Transformers in action: weakly supervised action segmentation
The video action segmentation task is regularly explored under weaker forms of supervision,
such as transcript supervision, where a list of actions is easier to obtain than dense frame …
such as transcript supervision, where a list of actions is easier to obtain than dense frame …
Towards a broad coverage named entity resource: A data-efficient approach for many diverse languages
Parallel corpora are ideal for extracting a multilingual named entity (MNE) resource, ie, a
dataset of names translated into multiple languages. Prior work on extracting MNE datasets …
dataset of names translated into multiple languages. Prior work on extracting MNE datasets …
Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss
E Georgiou, K Kritsis… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Deep learning Text-to-Speech (TTS) systems have achieved impressive generated speech
quality, close to human parity. However, they suffer from training stability issues and in …
quality, close to human parity. However, they suffer from training stability issues and in …
Improving Multi-Speaker ASR With Overlap-Aware Encoding And Monotonic Attention
End-to-end (E2E) multi-speaker speech recognition with the serialized output training (SOT)
strategy demonstrates good performance in modeling diverse speaker scenarios. However …
strategy demonstrates good performance in modeling diverse speaker scenarios. However …
Monotonic Gaussian regularization of attention for robust automatic speech recognition
Abstract The Attention-based Encoder–Decoder (AED) models are one of the most popular
models for Automatic Speech Recognition (ASR). However, instability can occur in AED with …
models for Automatic Speech Recognition (ASR). However, instability can occur in AED with …
[PDF][PDF] 基于子音节表征的苗语语音合成方法
蔡姗, 王林, 谭棉, 郭胜, 吴磊, 王飞 - 科学技术与工程, 2024 - stae.com.cn
摘要少数民族语言的语音合成有助于民族文化的传承, 保护和发展, 目前相关研究成果较少.
针对不同声调的相同词发音相似时易出现语音合成错误的问题, 提出了一种基于子音节表征的苗 …
针对不同声调的相同词发音相似时易出现语音合成错误的问题, 提出了一种基于子音节表征的苗 …
[PDF][PDF] Manipulating Data Representations for Neural Machine Translation
C Amrhein - 2023 - zora.uzh.ch
In natural language processing, much current research focuses on training larger and larger
models on more and more data. In this thesis, we argue that how data is represented can …
models on more and more data. In this thesis, we argue that how data is represented can …