SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arXiv preprint arXiv …, 2023 - arxiv.org
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

Attention-inspired artificial neural networks for speech processing: A systematic review

N Zacarias-Morales, P Pancardo… - Symmetry, 2021 - mdpi.com
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the
human brain and have been widely applied in speech processing. The application areas of …

On-the-fly data loader and utterance-level aggregation for speaker and language recognition

W Cai, J Chen, J Zhang, M Li - IEEE/ACM Transactions on …, 2020 - ieeexplore.ieee.org
In this article, our recent efforts on directly modeling utterance-level aggregation for speaker
and language recognition is summarized. First, an on-the-fly data loader for efficient network …

Efficient self-supervised learning representations for spoken language identification

H Liu, LPG Perera, AWH Khong… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Self-supervised learning has been widely exploited to learn powerful speech
representations. The premise of this paper is that these learned self-supervised …

Cross-UNet: dual-branch infrared and visible image fusion framework based on cross-convolution and attention mechanism

X Wang, Z Hua, J Li - The Visual Computer, 2023 - Springer
Existing infrared and visible image fusion methods suffer from edge information loss, artifact
introduction, and image distortion. Therefore, a dual-branch network model based on the …

Improving deep CNN networks with long temporal context for text-independent speaker verification

Y Zhao, T Zhou, Z Chen, J Wu - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Deep CNN networks have shown great success in various tasks for text-independent
speaker recognition. In this paper, we explore two approaches for modeling long temporal …

End-to-end language diarization for bilingual code-switching speech

H Liu, LPG Perera, X Zhang, J Dauwels… - … Conference of the …, 2021 - research.tudelft.nl
We propose two end-to-end neural configurations for language diarization on bilingual code-
switching speech. The first, a BLSTM-E2E architecture, includes a set of stacked …

[HTML][HTML] Spoken Language Identification: An overview of past and present research trends

D O'Shaughnessy - Speech Communication, 2024 - Elsevier
Identification of the language used in spoken utterances is useful for multiple applications,
eg, assist in directing or automating telephone calls, or selecting which language-specific …

PHO-LID: A unified model incorporating acoustic-phonetic and phonotactic information for language identification

H Liu, LPG Perera, AWH Khong, SJ Styles… - arXiv preprint arXiv …, 2022 - arxiv.org
We propose a novel model to hierarchically incorporate phoneme and phonotactic
information for language identification (LID) without requiring phoneme annotations for …

Mandarin-english code-switching speech recognition with self-supervised speech representation models

LH Tseng, YK Fu, HJ Chang, H Lee - arXiv preprint arXiv:2110.03504, 2021 - arxiv.org
Code-switching (CS) is common in daily conversations where more than one language is
used within a sentence. The difficulties of CS speech recognition lie in alternating languages …