Using transformers for multimodal emotion recognition: Taxonomies and state of the art review
S Hazmoune, F Bougamouza - Engineering Applications of Artificial …, 2024 - Elsevier
Emotion recognition is an aspect of human-computer interaction, affective computing, and
social robotics. Conventional unimodal approaches for emotion recognition, depending on …
social robotics. Conventional unimodal approaches for emotion recognition, depending on …
Reproducing whisper-style training using an open-source toolkit and publicly available data
Pre-training speech models on large volumes of data has achieved remarkable success.
OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised …
OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised …
OWSM v3. 1: Better and faster open whisper-style speech models based on e-branchformer
Recent studies have advocated for fully open foundation models to promote transparency
and open science. As an initial step, the Open Whisper-style Speech Model (OWSM) …
and open science. As an initial step, the Open Whisper-style Speech Model (OWSM) …
COLLD: Contrastive Layer-to-Layer Distillation for Compressing Multilingual Pre-Trained Speech Encoders
Large-scale self-supervised pre-trained speech encoders outperform conventional
approaches in speech recognition and translation tasks. Due to the high cost of developing …
approaches in speech recognition and translation tasks. Due to the high cost of developing …
Lookahead when it matters: Adaptive non-causal transformers for streaming neural transducers
Streaming speech recognition architectures are employed for low-latency, real-time
applications. Such architectures are often characterized by their causality. Causal …
applications. Such architectures are often characterized by their causality. Causal …
[HTML][HTML] Model and Method for Providing Resilience to Resource-Constrained AI-System
V Moskalenko, V Kharchenko, S Semenov - Sensors, 2024 - mdpi.com
Artificial intelligence technologies are becoming increasingly prevalent in resource-
constrained, safety-critical embedded systems. Numerous methods exist to enhance the …
constrained, safety-critical embedded systems. Numerous methods exist to enhance the …
Speech Recognition Transformers: Topological-lingualism Perspective
Transformers have evolved with great success in various artificial intelligence tasks. Thanks
to our recent prevalence of self-attention mechanisms, which capture long-term …
to our recent prevalence of self-attention mechanisms, which capture long-term …
Improving vision-inspired keyword spotting using dynamic module skipping in streaming conformer encoder
Using a vision-inspired keyword spotting framework, we propose an architecture with input-
dependent dynamic depth capable of processing streaming audio. Specifically, we extend a …
dependent dynamic depth capable of processing streaming audio. Specifically, we extend a …
CTC Blank Triggered Dynamic Layer-Skipping for Efficient Ctc-Based Speech Recognition
J Hou, P Wang, J Zhang, M Yang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Deploying end-to-end speech recognition models with limited computing resources remains
challenging, despite their impressive performance. Given the gradual increase in model size …
challenging, despite their impressive performance. Given the gradual increase in model size …