Cramming: Training a Language Model on a single GPU in one day.

J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press
Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

DurIAN-E: Duration informed attention network for expressive text-to-speech synthesis

Y Gu, Y Bian, G Lei, C Weng, D Su - arXiv preprint arXiv:2309.12792, 2023 - arxiv.org
This paper introduces an improved duration informed attention neural network (DurIAN-E)
for expressive and high-fidelity text-to-speech (TTS) synthesis. Inherited from the original …

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

S Yang, B Wang, Y Zhang, Y Shen, Y Kim - arXiv preprint arXiv …, 2024 - arxiv.org
Transformers with linear attention (ie, linear transformers) and state-space models have
recently been suggested as a viable linear-time alternative to transformers with softmax …

DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis

Y Gu, Q Zhu, G Lei, C Weng… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
This paper proposes an improved version of DurIAN-E (DurIAN-E 2), which is also a
duration informed attention neural network for expressive and high-fidelity text-to-speech …