Cramming: Training a Language Model on a single GPU in one day.
J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press
Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …
scaling, and have resulted in an environment where training language models is out of …
DurIAN-E: Duration informed attention network for expressive text-to-speech synthesis
This paper introduces an improved duration informed attention neural network (DurIAN-E)
for expressive and high-fidelity text-to-speech (TTS) synthesis. Inherited from the original …
for expressive and high-fidelity text-to-speech (TTS) synthesis. Inherited from the original …
Parallelizing Linear Transformers with the Delta Rule over Sequence Length
Transformers with linear attention (ie, linear transformers) and state-space models have
recently been suggested as a viable linear-time alternative to transformers with softmax …
recently been suggested as a viable linear-time alternative to transformers with softmax …
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis
This paper proposes an improved version of DurIAN-E (DurIAN-E 2), which is also a
duration informed attention neural network for expressive and high-fidelity text-to-speech …
duration informed attention neural network for expressive and high-fidelity text-to-speech …