Simple recurrence improves masked language models

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Simple recurrence improves masked language models

在引用文章中搜索

[PDF] mlr.press

Cramming: Training a Language Model on a single GPU in one day.

J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press

Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

被引用次数：74 相关文章所有 7 个版本

[PDF] arxiv.org

DurIAN-E: Duration informed attention network for expressive text-to-speech synthesis

Y Gu, Y Bian, G Lei, C Weng, D Su - arXiv preprint arXiv:2309.12792, 2023 - arxiv.org

This paper introduces an improved duration informed attention neural network (DurIAN-E)
for expressive and high-fidelity text-to-speech (TTS) synthesis. Inherited from the original …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

S Yang, B Wang, Y Zhang, Y Shen, Y Kim - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers with linear attention (ie, linear transformers) and state-space models have
recently been suggested as a viable linear-time alternative to transformers with softmax …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis

Y Gu, Q Zhu, G Lei, C Weng… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

This paper proposes an improved version of DurIAN-E (DurIAN-E 2), which is also a
duration informed attention neural network for expressive and high-fidelity text-to-speech …