Contemporary approaches in evolving language models

D Oralbekova, O Mamyrbayev, M Othman… - Applied Sciences, 2023 - mdpi.com
This article provides a comprehensive survey of contemporary language modeling
approaches within the realm of natural language processing (NLP) tasks. This paper …

[PDF][PDF] Advancing neural language modeling in automatic speech recognition.

K Irie - 2020 - publications.rwth-aachen.de
Statistical language modeling is one of the fundamental problems in natural language
processing. In the recent years, language modeling has seen great advances by active …

N-gram Based Croatian Language Network: Application in a Smart Environment

R Šoić, M Vuković - Journal of communications software and systems, 2022 - hrcak.srce.hr
Sažetak In the field of natural language processing, language networks represent a method
for observing linguistic units and their interactions in different linguistic contexts. This paper …

Augmented-syllabification of n-gram tagger for Indonesian words and named-entities

S Suyanto, A Sunyoto, RN Ismail, A Romadhony… - Heliyon, 2022 - cell.com
As one of the statistical-based models, an n-gram syllabification commonly gives a high
syllable error rate (SER) for Bahasa Indonesia, one of the low-resource languages, since it …

Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation

Y Zhou, M Keuper, M Fritz - arXiv preprint arXiv:2408.13586, 2024 - arxiv.org
Sampling-based decoding strategies have been widely adopted for Large Language
Models (LLMs) in numerous applications, which target a balance between diversity and …

Sleep Model: A Sequence Model for Predicting the Next Sleep Stage

I Choi, W Sung - 2023 IEEE Biomedical Circuits and Systems …, 2023 - ieeexplore.ieee.org
As sleep disorders are becoming more prevalent there is an urgent need to classify sleep
stages in a less disturbing way. In particular, sleep-stage classification using simple sensors …

Data augmentation methods for low-resource orthographic syllabification

S Suyanto, KM Lhaksmana, MA Bijaksana… - IEEE …, 2020 - ieeexplore.ieee.org
An n-gram syllabification model generally produces a high error rate for a low-resource
language, such as Indonesian, because of the high rate of out-of-vocabulary (OOV) n-grams …

Improving low compute language modeling with in-domain embedding initialisation

C Welch, R Mihalcea, JK Kummerfeld - arXiv preprint arXiv:2009.14109, 2020 - arxiv.org
Many NLP applications, such as biomedical data and technical support, have 10-100 million
tokens of in-domain data and limited computational resources for learning from it. How …

Indonesian Graphemic Syllabification Using n-Gram Tagger with State-Elimination

RN Ismail, S Suyanto - 2020 8th International Conference on …, 2020 - ieeexplore.ieee.org
Syllabification can be approached using either grapheme or phoneme-based. Graphemic
syllabification is simpler than phonemic syllabification since it does not require grapheme-to …

Efficient MDI adaptation for n-gram language models

R Huang, K Li, A Arora, D Povey… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper presents an efficient algorithm for n-gram language model adaptation under the
minimum discrimination information (MDI) principle, where an out-of-domain language …