On biasing transformer attention towards monotonicity

O Kuparinen, AM Haddad… - Conference on Empirical …, 2023 - researchportal.helsinki.fi

Text normalization methods have been commonly applied to historical language or user-
generated content, but less often to dialectal transcriptions. In this paper, we introduce …

被引用次数：10 相关文章所有 9 个版本

[PDF] arxiv.org

Transformers in action: weakly supervised action segmentation

J Ridley, H Coskun, DJ Tan, N Navab… - arXiv preprint arXiv …, 2022 - arxiv.org

The video action segmentation task is regularly explored under weaker forms of supervision,
such as transcript supervision, where a list of actions is easier to obtain than dense frame …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Towards a broad coverage named entity resource: A data-efficient approach for many diverse languages

S Severini, A Imani, P Dufter, H Schütze - arXiv preprint arXiv:2201.12219, 2022 - arxiv.org

Parallel corpora are ideal for extracting a multilingual named entity (MNE) resource, ie, a
dataset of names translated into multiple languages. Prior work on extracting MNE datasets …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss

E Georgiou, K Kritsis… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Deep learning Text-to-Speech (TTS) systems have achieved impressive generated speech
quality, close to human parity. However, they suffer from training stability issues and in …

被引用次数：2 相关文章所有 4 个版本

Improving Multi-Speaker ASR With Overlap-Aware Encoding And Monotonic Attention

T Li, F Wang, W Guan, L Huang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

End-to-end (E2E) multi-speaker speech recognition with the serialized output training (SOT)
strategy demonstrates good performance in modeling diverse speaker scenarios. However …

被引用次数：1 相关文章

Monotonic Gaussian regularization of attention for robust automatic speech recognition

Y Du, M Wu, X Fang, Z Yang - Computer Speech & Language, 2023 - Elsevier

Abstract The Attention-based Encoder–Decoder (AED) models are one of the most popular
models for Automatic Speech Recognition (ASR). However, instability can occur in AED with …

被引用次数：1 相关文章所有 2 个版本

[PDF] stae.com.cn

[PDF][PDF] 基于子音节表征的苗语语音合成方法

蔡姗，王林，谭棉，郭胜，吴磊，王飞 - 科学技术与工程, 2024 - stae.com.cn

摘要少数民族语言的语音合成有助于民族文化的传承, 保护和发展, 目前相关研究成果较少.
针对不同声调的相同词发音相似时易出现语音合成错误的问题, 提出了一种基于子音节表征的苗 …

[PDF] uzh.ch

[PDF][PDF] Manipulating Data Representations for Neural Machine Translation

C Amrhein - 2023 - zora.uzh.ch

In natural language processing, much current research focuses on training larger and larger
models on more and more data. In this thesis, we argue that how data is represented can …