Fast multi-language LSTM-based online handwriting recognition
We describe an online handwriting system that is able to support 102 languages using a
deep neural network architecture. This new system has completely replaced our previous …
deep neural network architecture. This new system has completely replaced our previous …
Language ID in the wild: Unexpected challenges on the path to a thousand-language web text corpus
Large text corpora are increasingly important for a wide variety of Natural Language
Processing (NLP) tasks, and automatic language identification (LangID) is a core technology …
Processing (NLP) tasks, and automatic language identification (LangID) is a core technology …
Writing system and speaker metadata for 2,800+ language varieties
D van Esch, T Lucassen, S Ruder… - Proceedings of the …, 2022 - aclanthology.org
We describe an open-source dataset providing metadata for about 2,800 language varieties
used in the world today. Specifically, the dataset provides the attested writing system (s) for …
used in the world today. Specifically, the dataset provides the attested writing system (s) for …
No data to crawl? monolingual corpus creation from PDF files of truly low-resource languages in Peru
G Bustamante, A Oncevay… - Proceedings of the Twelfth …, 2020 - aclanthology.org
We introduce new monolingual corpora for four indigenous and endangered languages
from Peru: Shipibo-konibo, Ashaninka, Yanesha and Yine. Given the total absence of these …
from Peru: Shipibo-konibo, Ashaninka, Yanesha and Yine. Given the total absence of these …
Writing across the world's languages: Deep internationalization for Gboard, the Google keyboard
D van Esch, E Sarbar, T Lucassen, J O'Brien… - arXiv preprint arXiv …, 2019 - arxiv.org
This technical report describes our deep internationalization program for Gboard, the
Google Keyboard. Today, Gboard supports 900+ language varieties across 70+ writing …
Google Keyboard. Today, Gboard supports 900+ language varieties across 70+ writing …
Indylstms: independently recurrent LSTMs
P Gonnet, T Deselaers - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org
We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs. These
differ from regular LSTM cells in that the recurrent weights are not modeled as a full matrix …
differ from regular LSTM cells in that the recurrent weights are not modeled as a full matrix …
[PDF][PDF] Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data.
When building automatic speech recognition (ASR) systems, typically some amount of audio
and text data in the target language is needed. While text data can be obtained relatively …
and text data in the target language is needed. While text data can be obtained relatively …
[PDF][PDF] Unified Verbalization for Speech Recognition & Synthesis Across Languages.
We describe a new approach to converting written tokens to their spoken form, which can be
shared by automatic speech recognition (ASR) and text-to-speech synthesis (TTS) systems …
shared by automatic speech recognition (ASR) and text-to-speech synthesis (TTS) systems …
[PDF][PDF] Developing Pronunciation Models in New Languages Faster by Exploiting Common Grapheme-to-Phoneme Correspondences Across Languages.
H Bleyan, S Ritchie, JF Mortensen, D van Esch - INTERSPEECH, 2019 - academia.edu
We discuss two methods that let us easily create grapheme-tophoneme (G2P) conversion
systems for languages without any human-curated pronunciation lexicons, as long as we …
systems for languages without any human-curated pronunciation lexicons, as long as we …
Now You See Me, Now You Don't:'Poverty of the Stimulus' Problems and Arbitrary Correspondences in End-to-End Speech Models
D van Esch - Proceedings of the Second Workshop on …, 2024 - aclanthology.org
End-to-end models for speech recognition and speech synthesis have many benefits, but we
argue they also face a unique set of challenges not encountered in conventional multi-stage …
argue they also face a unique set of challenges not encountered in conventional multi-stage …