Utterance-level end-to-end language identification using attention-based CNN-BLSTM

L Barrault, YA Chung, MC Meglioli, D Dale… - arXiv preprint arXiv …, 2023 - arxiv.org

What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

被引用次数：92 相关文章

[PDF] mdpi.com

Attention-inspired artificial neural networks for speech processing: A systematic review

N Zacarias-Morales, P Pancardo… - Symmetry, 2021 - mdpi.com

Artificial Neural Networks (ANNs) were created inspired by the neural networks in the
human brain and have been widely applied in speech processing. The application areas of …

被引用次数：27 相关文章所有 8 个版本

[PDF] duke.edu

On-the-fly data loader and utterance-level aggregation for speaker and language recognition

W Cai, J Chen, J Zhang, M Li - IEEE/ACM Transactions on …, 2020 - ieeexplore.ieee.org

In this article, our recent efforts on directly modeling utterance-level aggregation for speaker
and language recognition is summarized. First, an on-the-fly data loader for efficient network …

被引用次数：96 相关文章所有 3 个版本

[PDF] researchgate.net

Efficient self-supervised learning representations for spoken language identification

H Liu, LPG Perera, AWH Khong… - IEEE Journal of …, 2022 - ieeexplore.ieee.org

Self-supervised learning has been widely exploited to learn powerful speech
representations. The premise of this paper is that these learned self-supervised …

被引用次数：22 相关文章所有 3 个版本

Cross-UNet: dual-branch infrared and visible image fusion framework based on cross-convolution and attention mechanism

X Wang, Z Hua, J Li - The Visual Computer, 2023 - Springer

Existing infrared and visible image fusion methods suffer from edge information loss, artifact
introduction, and image distortion. Therefore, a dual-branch network model based on the …

被引用次数：21 相关文章所有 2 个版本

Improving deep CNN networks with long temporal context for text-independent speaker verification

Y Zhao, T Zhou, Z Chen, J Wu - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

Deep CNN networks have shown great success in various tasks for text-independent
speaker recognition. In this paper, we explore two approaches for modeling long temporal …

被引用次数：49 相关文章

[PDF] researchgate.net

End-to-end language diarization for bilingual code-switching speech

H Liu, LPG Perera, X Zhang, J Dauwels… - … Conference of the …, 2021 - research.tudelft.nl

We propose two end-to-end neural configurations for language diarization on bilingual code-
switching speech. The first, a BLSTM-E2E architecture, includes a set of stacked …

被引用次数：27 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] Spoken Language Identification: An overview of past and present research trends

D O'Shaughnessy - Speech Communication, 2024 - Elsevier

Identification of the language used in spoken utterances is useful for multiple applications,
eg, assist in directing or automating telephone calls, or selecting which language-specific …

[PDF] arxiv.org

PHO-LID: A unified model incorporating acoustic-phonetic and phonotactic information for language identification

H Liu, LPG Perera, AWH Khong, SJ Styles… - arXiv preprint arXiv …, 2022 - arxiv.org

We propose a novel model to hierarchically incorporate phoneme and phonotactic
information for language identification (LID) without requiring phoneme annotations for …

被引用次数：15 相关文章所有 9 个版本

[PDF] arxiv.org

Mandarin-english code-switching speech recognition with self-supervised speech representation models

LH Tseng, YK Fu, HJ Chang, H Lee - arXiv preprint arXiv:2110.03504, 2021 - arxiv.org

Code-switching (CS) is common in daily conversations where more than one language is
used within a sentence. The difficulties of CS speech recognition lie in alternating languages …

被引用次数：18 相关文章所有 5 个版本