Expanding n-gram training data for language models based on morpho-syntactic transformations- 学术资源搜索

Expanding n-gram training data for language models based on morpho-syntactic transformations

L Verwimp, J Pelemans, H van Hamme… - … Linguistics in the …, 2015 - clinjournal.org

L Verwimp, J Pelemans, H van Hamme, P Wambacq

Computational Linguistics in the Netherlands Journal, 2015•clinjournal.org

Abstract

The subject of this paper is the expansion of n-gram training data with the aid of morphosyntactic transformations, in order to create a larger amount of reliable n-grams for Dutch language models. The main aim of this technique is to alleviate a classical problem for language models: data sparsity. Moreover, since language models for automatic speech recognition are usually trained on written language resources while they are tested on spoken language, certain patterns that are typical for spontanous spoken language will be under-represented and patterns characteristic of written language will be over-represented. By adding transformed n-grams, we hope to adapt the language model such that it matches better with spoken language. We investigate whether a language model trained on the expanded data performs better than a baseline n-gram model with modified Kneser-Ney smoothing in terms of perplexity and word error rate. Several alternatives for the probability estimation of the transformed n-grams are explored, and an approach to deal with separable verbs in Dutch is also discussed.

clinjournal.org

展开收起

被引用次数：3 相关文章所有 4 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果