[PDF][PDF] Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0

M Creutz, K Lagus - … from text corpora using Morfessor 1.0, 2005 - researchportal.helsinki.fi
In this work, we describe the first public version of the Morfessor software, which is a
program that takes as input a corpus of unannotated text and produces a segmentation of …

[PDF][PDF] Inducing the morphological lexicon of a natural language from unannotated text

MJP Creutz, KH Lagus - International and Interdisciplinary …, 2005 - researchportal.helsinki.fi
This work presents an algorithm for the unsupervised learning, or induction, of a simple
morphology of a natural language. A probabilistic maximum a posteriori model is utilized …

Morph-based speech recognition and modeling of out-of-vocabulary words across languages

M Creutz, T Hirsimäki, M Kurimo, A Puurula… - ACM Transactions on …, 2007 - dl.acm.org
We explore the use of morph-based language models in large-vocabulary continuous-
speech recognition systems across four so-called morphologically rich languages: Finnish …

Unlimited vocabulary speech recognition with morph language models applied to Finnish

T Hirsimäki, M Creutz, V Siivola, M Kurimo… - Computer Speech & …, 2006 - Elsevier
In the speech recognition of highly inflecting or compounding languages, the traditional
word-based language modeling is problematic. As the number of distinct word forms can …

Turkish broadcast news transcription and retrieval

E Arisoy, D Can, S Parlak, H Sak… - IEEE Transactions on …, 2009 - ieeexplore.ieee.org
This paper summarizes our recent efforts for building a Turkish Broadcast News transcription
and retrieval system. The agglutinative nature of Turkish leads to a high number of out-of …

Importance of high-order n-gram models in morph-based speech recognition

T Hirsimaki, J Pylkkonen… - IEEE Transactions on …, 2009 - ieeexplore.ieee.org
Speech recognition systems trained for morphologically rich languages face the problem of
vocabulary growth caused by prefixes, suffixes, inflections, and compound words. Solutions …

Automatic speech recognition for under-resourced languages: application to Vietnamese language

VB Le, L Besacier - IEEE Transactions on Audio, Speech, and …, 2009 - ieeexplore.ieee.org
This paper presents our work in automatic speech recognition (ASR) in the context of under-
resourced languages with application to Vietnamese. Different techniques for bootstrapping …

[PDF][PDF] Unlimited vocabulary speech recognition for agglutinative languages

M Kurimo, A Puurula, E Arisoy, V Siivola… - Proceedings of the …, 2006 - aclanthology.org
It is practically impossible to build a word-based lexicon for speech recognition in
agglutinative languages that would cover all the relevant words. The problem is that words …

Highly accurate children's speech recognition for interactive reading tutors using subword units

A Hagen, B Pellom, R Cole - Speech Communication, 2007 - Elsevier
Speech technology offers great promise in the field of automated literacy and reading tutors
for children. In such applications speech recognition can be used to track the reading …

[PDF][PDF] Induction of a simple morphology for highly-inflecting languages

MJP Creutz, KH Lagus - 7th Meeting of the ACL Special …, 2004 - researchportal.helsinki.fi
This paper presents an algorithm for the unsupervised learning of a simple morphology of a
natural language from raw text. A generative probabilistic model is applied to segment word …