I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

S Hazmoune, F Bougamouza - Engineering Applications of Artificial …, 2024 - Elsevier

Emotion recognition is an aspect of human-computer interaction, affective computing, and
social robotics. Conventional unimodal approaches for emotion recognition, depending on …

被引用次数：17 相关文章

[PDF] arxiv.org

Reproducing whisper-style training using an open-source toolkit and publicly available data

Y Peng, J Tian, B Yan, D Berrebbi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Pre-training speech models on large volumes of data has achieved remarkable success.
OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised …

被引用次数：33 相关文章所有 5 个版本

[PDF] arxiv.org

OWSM v3. 1: Better and faster open whisper-style speech models based on e-branchformer

Y Peng, J Tian, W Chen, S Arora, B Yan, Y Sudo… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent studies have advocated for fully open foundation models to promote transparency
and open science. As an initial step, the Open Whisper-style Speech Model (OWSM) …

被引用次数：24 相关文章所有 2 个版本

[PDF] arxiv.org

COLLD: Contrastive Layer-to-Layer Distillation for Compressing Multilingual Pre-Trained Speech Encoders

HJ Chang, N Dong, R Mavlyutov… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Large-scale self-supervised pre-trained speech encoders outperform conventional
approaches in speech recognition and translation tasks. Due to the high cost of developing …

被引用次数：3 相关文章所有 3 个版本

[PDF] mlr.press

Lookahead when it matters: Adaptive non-causal transformers for streaming neural transducers

G Strimel, Y Xie, BJ King, M Radfar… - International …, 2023 - proceedings.mlr.press

Streaming speech recognition architectures are employed for low-latency, real-time
applications. Such architectures are often characterized by their causality. Causal …

CTC Blank Triggered Dynamic Layer-Skipping for Efficient Ctc-Based Speech Recognition

J Hou, P Wang, J Zhang, M Yang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Deploying end-to-end speech recognition models with limited computing resources remains
challenging, despite their impressive performance. Given the gradual increase in model size …

被引用次数：1 相关文章所有 3 个版本