Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
This paper explores the improvement of post-training quantization (PTQ) after knowledge
distillation in the Whisper speech foundation model family. We address the challenge of …
distillation in the Whisper speech foundation model family. We address the challenge of …
One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
We propose a novel one-pass multiple ASR systems joint compression and quantization
approach using an all-in-one neural model. A single compression cycle allows multiple …
approach using an all-in-one neural model. A single compression cycle allows multiple …