[PDF][PDF] Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0
In this work, we describe the first public version of the Morfessor software, which is a
program that takes as input a corpus of unannotated text and produces a segmentation of …
program that takes as input a corpus of unannotated text and produces a segmentation of …
[PDF][PDF] Inducing the morphological lexicon of a natural language from unannotated text
MJP Creutz, KH Lagus - International and Interdisciplinary …, 2005 - researchportal.helsinki.fi
This work presents an algorithm for the unsupervised learning, or induction, of a simple
morphology of a natural language. A probabilistic maximum a posteriori model is utilized …
morphology of a natural language. A probabilistic maximum a posteriori model is utilized …
Morph-based speech recognition and modeling of out-of-vocabulary words across languages
We explore the use of morph-based language models in large-vocabulary continuous-
speech recognition systems across four so-called morphologically rich languages: Finnish …
speech recognition systems across four so-called morphologically rich languages: Finnish …
Unlimited vocabulary speech recognition with morph language models applied to Finnish
In the speech recognition of highly inflecting or compounding languages, the traditional
word-based language modeling is problematic. As the number of distinct word forms can …
word-based language modeling is problematic. As the number of distinct word forms can …
Turkish broadcast news transcription and retrieval
This paper summarizes our recent efforts for building a Turkish Broadcast News transcription
and retrieval system. The agglutinative nature of Turkish leads to a high number of out-of …
and retrieval system. The agglutinative nature of Turkish leads to a high number of out-of …
Importance of high-order n-gram models in morph-based speech recognition
T Hirsimaki, J Pylkkonen… - IEEE Transactions on …, 2009 - ieeexplore.ieee.org
Speech recognition systems trained for morphologically rich languages face the problem of
vocabulary growth caused by prefixes, suffixes, inflections, and compound words. Solutions …
vocabulary growth caused by prefixes, suffixes, inflections, and compound words. Solutions …
Automatic speech recognition for under-resourced languages: application to Vietnamese language
VB Le, L Besacier - IEEE Transactions on Audio, Speech, and …, 2009 - ieeexplore.ieee.org
This paper presents our work in automatic speech recognition (ASR) in the context of under-
resourced languages with application to Vietnamese. Different techniques for bootstrapping …
resourced languages with application to Vietnamese. Different techniques for bootstrapping …
[PDF][PDF] Unlimited vocabulary speech recognition for agglutinative languages
It is practically impossible to build a word-based lexicon for speech recognition in
agglutinative languages that would cover all the relevant words. The problem is that words …
agglutinative languages that would cover all the relevant words. The problem is that words …
Highly accurate children's speech recognition for interactive reading tutors using subword units
Speech technology offers great promise in the field of automated literacy and reading tutors
for children. In such applications speech recognition can be used to track the reading …
for children. In such applications speech recognition can be used to track the reading …
[PDF][PDF] Induction of a simple morphology for highly-inflecting languages
MJP Creutz, KH Lagus - 7th Meeting of the ACL Special …, 2004 - researchportal.helsinki.fi
This paper presents an algorithm for the unsupervised learning of a simple morphology of a
natural language from raw text. A generative probabilistic model is applied to segment word …
natural language from raw text. A generative probabilistic model is applied to segment word …