Correcting diacritics and typos with a ByT5 transformer model

L Stankevičius, M Lukoševičius, J Kapočiūtė-Dzikienė… - Applied Sciences, 2022 - mdpi.com
Due to the fast pace of life and online communications and the prevalence of English and
the QWERTY keyboard, people tend to forgo using diacritics, make typographical errors …

Diacritics generation and application in hate speech detection on Vietnamese social networks

P Le-Hong - Knowledge-Based Systems, 2021 - Elsevier
One of the challenging problems in text processing is diacritics generation where one needs
to generate diacritic marks for non-accented text. With an ever increasing amount of informal …

How much does lookahead matter for disambiguation? partial arabic diacritization case study

S Esmail, K Bar, N Dershowitz - Computational Linguistics, 2022 - direct.mit.edu
We suggest a model for partial diacritization of deep orthographies. We focus on Arabic,
where the optional indication of selected vowels by means of diacritics can resolve …

[PDF][PDF] Integrating diacritics restoration and question classification into vietnamese question answering system

BT Hung - Adv. Sci. Technol. Eng. Syst. J, 2019 - researchgate.net
This paper presents a solution for question answering system for Vietnamese language by
integrating diacritics restoration and question classification via deep learning approach. It …

Improvement of intent classification using diacritic restoration for text message in chatbot

T Nguyen, M Shcherbakov - Creativity in Intelligent Technologies and Data …, 2021 - Springer
Today, chatbots or conversational agents are increasingly being used in various fields. In
recent years, governments, organizations, and businesses have invested in dialogue …

[PDF][PDF] TDP–A hybrid diacritic restoration with transformer decoder

TDA Dang, TTT Nguyen - Proceedings of the 34th Pacific Asia …, 2020 - aclanthology.org
Diacritic restoration plays an important role in Natural Language Processing (NLP) for many
diacritical languages such as Vietnamese, Czech, Hungarian, etc. With the development of …

Automatic Diacritic Recovery with focus on the Quality of the training Corpus for Resource-scarce Languages

II Ayogu, O Abu - 2020 IEEE 2nd International Conference on …, 2021 - ieeexplore.ieee.org
The development and availability of high quality corpus for many African languages is still
hampered by dearth of appropriate software tools and devices. To be able to rapidly create …

College admissions counseling using intelligent question answering system

BT Hung - … Systems and Sustainable Computing: Proceedings of …, 2022 - Springer
College admission is successful based on admission counseling for college entrance
examinations. To understand the user's intentions and provide students and parents with all …

Diacritics correction in Turkish with context-aware sequence to sequence modeling

AT Özge, Ö Bozal, U Özge - Turkish Journal of Electrical …, 2022 - journals.tubitak.gov.tr
Digital texts in many languages have examples of missing or misused diacritics which
makes it hard for natural language processing applications to disambiguate the meaning of …

[PDF][PDF] The Rise and Fall and Rise again of the Seventh Letter: A Technological Story of the Russian Alphabet

A Doroshev, A Polyakova - Technology and Language, 2023 - soctech.spbstu.ru
It is generally understood that the letter ё is not obligatory in the rules of Russian language–
even though there are many names and words that include it. This lack of obligation led to …