SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …
between any two languages? While recent breakthroughs in text-based models have …
Attention-inspired artificial neural networks for speech processing: A systematic review
N Zacarias-Morales, P Pancardo… - Symmetry, 2021 - mdpi.com
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the
human brain and have been widely applied in speech processing. The application areas of …
human brain and have been widely applied in speech processing. The application areas of …
On-the-fly data loader and utterance-level aggregation for speaker and language recognition
In this article, our recent efforts on directly modeling utterance-level aggregation for speaker
and language recognition is summarized. First, an on-the-fly data loader for efficient network …
and language recognition is summarized. First, an on-the-fly data loader for efficient network …
Efficient self-supervised learning representations for spoken language identification
Self-supervised learning has been widely exploited to learn powerful speech
representations. The premise of this paper is that these learned self-supervised …
representations. The premise of this paper is that these learned self-supervised …
Cross-UNet: dual-branch infrared and visible image fusion framework based on cross-convolution and attention mechanism
X Wang, Z Hua, J Li - The Visual Computer, 2023 - Springer
Existing infrared and visible image fusion methods suffer from edge information loss, artifact
introduction, and image distortion. Therefore, a dual-branch network model based on the …
introduction, and image distortion. Therefore, a dual-branch network model based on the …
Improving deep CNN networks with long temporal context for text-independent speaker verification
Deep CNN networks have shown great success in various tasks for text-independent
speaker recognition. In this paper, we explore two approaches for modeling long temporal …
speaker recognition. In this paper, we explore two approaches for modeling long temporal …
End-to-end language diarization for bilingual code-switching speech
We propose two end-to-end neural configurations for language diarization on bilingual code-
switching speech. The first, a BLSTM-E2E architecture, includes a set of stacked …
switching speech. The first, a BLSTM-E2E architecture, includes a set of stacked …
[HTML][HTML] Spoken Language Identification: An overview of past and present research trends
D O'Shaughnessy - Speech Communication, 2024 - Elsevier
Identification of the language used in spoken utterances is useful for multiple applications,
eg, assist in directing or automating telephone calls, or selecting which language-specific …
eg, assist in directing or automating telephone calls, or selecting which language-specific …
PHO-LID: A unified model incorporating acoustic-phonetic and phonotactic information for language identification
We propose a novel model to hierarchically incorporate phoneme and phonotactic
information for language identification (LID) without requiring phoneme annotations for …
information for language identification (LID) without requiring phoneme annotations for …
Mandarin-english code-switching speech recognition with self-supervised speech representation models
Code-switching (CS) is common in daily conversations where more than one language is
used within a sentence. The difficulties of CS speech recognition lie in alternating languages …
used within a sentence. The difficulties of CS speech recognition lie in alternating languages …