Show some love to your n-grams: A bit of progress and stronger n-gram language modeling baselines

D Oralbekova, O Mamyrbayev, M Othman… - Applied Sciences, 2023 - mdpi.com

This article provides a comprehensive survey of contemporary language modeling
approaches within the realm of natural language processing (NLP) tasks. This paper …

被引用次数：14 相关文章所有 3 个版本

[PDF] rwth-aachen.de

[PDF][PDF] Advancing neural language modeling in automatic speech recognition.

K Irie - 2020 - publications.rwth-aachen.de

Statistical language modeling is one of the fundamental problems in natural language
processing. In the recent years, language modeling has seen great advances by active …

被引用次数：14 相关文章所有 5 个版本

[PDF] srce.hr

N-gram Based Croatian Language Network: Application in a Smart Environment

R Šoić, M Vuković - Journal of communications software and systems, 2022 - hrcak.srce.hr

Sažetak In the field of natural language processing, language networks represent a method
for observing linguistic units and their interactions in different linguistic contexts. This paper …

被引用次数：5 相关文章所有 8 个版本

[PDF] cell.com Full View

Augmented-syllabification of n-gram tagger for Indonesian words and named-entities

S Suyanto, A Sunyoto, RN Ismail, A Romadhony… - Heliyon, 2022 - cell.com

As one of the statistical-based models, an n-gram syllabification commonly gives a high
syllable error rate (SER) for Bahasa Indonesia, one of the low-resource languages, since it …

被引用次数：2 相关文章所有 7 个版本

[PDF] arxiv.org

Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation

Y Zhou, M Keuper, M Fritz - arXiv preprint arXiv:2408.13586, 2024 - arxiv.org

Sampling-based decoding strategies have been widely adopted for Large Language
Models (LLMs) in numerous applications, which target a balance between diversity and …

Sleep Model: A Sequence Model for Predicting the Next Sleep Stage

I Choi, W Sung - 2023 IEEE Biomedical Circuits and Systems …, 2023 - ieeexplore.ieee.org

As sleep disorders are becoming more prevalent there is an urgent need to classify sleep
stages in a less disturbing way. In particular, sleep-stage classification using simple sensors …

被引用次数：2 相关文章所有 3 个版本

[PDF] ieee.org

Data augmentation methods for low-resource orthographic syllabification

S Suyanto, KM Lhaksmana, MA Bijaksana… - IEEE …, 2020 - ieeexplore.ieee.org

An n-gram syllabification model generally produces a high error rate for a low-resource
language, such as Indonesian, because of the high rate of out-of-vocabulary (OOV) n-grams …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Improving low compute language modeling with in-domain embedding initialisation

C Welch, R Mihalcea, JK Kummerfeld - arXiv preprint arXiv:2009.14109, 2020 - arxiv.org

Many NLP applications, such as biomedical data and technical support, have 10-100 million
tokens of in-domain data and limited computational resources for learning from it. How …

被引用次数：5 相关文章所有 3 个版本

Indonesian Graphemic Syllabification Using n-Gram Tagger with State-Elimination

RN Ismail, S Suyanto - 2020 8th International Conference on …, 2020 - ieeexplore.ieee.org

Syllabification can be approached using either grapheme or phoneme-based. Graphemic
syllabification is simpler than phonemic syllabification since it does not require grapheme-to …

被引用次数：4 相关文章

[PDF] arxiv.org

Efficient MDI adaptation for n-gram language models

R Huang, K Li, A Arora, D Povey… - arXiv preprint arXiv …, 2020 - arxiv.org

This paper presents an efficient algorithm for n-gram language model adaptation under the
minimum discrimination information (MDI) principle, where an out-of-domain language …

被引用次数：3 相关文章所有 12 个版本