Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models

D Wagner, I Baumann, K Riedhammer… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper explores the improvement of post-training quantization (PTQ) after knowledge
distillation in the Whisper speech foundation model family. We address the challenge of …

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

Z Li, H Xu, T Wang, S Hu, Z Jin, S Hu, J Deng… - arXiv preprint arXiv …, 2024 - arxiv.org
We propose a novel one-pass multiple ASR systems joint compression and quantization
approach using an all-in-one neural model. A single compression cycle allows multiple …