SeaLLMs--Large Language Models for Southeast Asia

XP Nguyen, W Zhang, X Li, M Aljunied, Z Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
Despite the remarkable achievements of large language models (LLMs) in various tasks,
there remains a linguistic bias that favors high-resource languages, such as English, often at …

Llama beyond english: An empirical study on language capability transfer

J Zhao, Z Zhang, L Gao, Q Zhang, T Gui… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent times, substantial advancements have been witnessed in large language models
(LLMs), exemplified by ChatGPT, showcasing remarkable proficiency across a range of …

[HTML][HTML] TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese

NK Corrêa, S Falk, S Fatimah, A Sen… - Machine Learning with …, 2024 - Elsevier
Large language models (LLMs) have significantly advanced natural language processing,
but their progress has yet to be equal across languages. While most LLMs are trained in …

Native vs non-native language prompting: A comparative analysis

MB Kmainasi, R Khan, AE Shahroor, B Bendou… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have shown remarkable abilities in different fields, including
standard Natural Language Processing (NLP) tasks. To elicit knowledge from LLMs …

Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the S\'ami Language

R Paul, H Buckchash, S Parida, DK Prasad - arXiv preprint arXiv …, 2024 - arxiv.org
S\'ami, an indigenous language group comprising multiple languages, faces digital
marginalization due to the limited availability of data and sophisticated language models …

Are emotions conveyed across machine translations? establishing an analytical process for the effectiveness of multilingual sentiment analysis with italian text

R Anderson, C Scala, J Samuel, V Kumar… - Journal of Big Data and …, 2024 - jbdai.org
Natural language processing (NLP) is being widely used globally for a variety of value-
creation tasks ranging from chat-bots and machine translations to sentiment and topic …

Data-Centric AI in the Age of Large Language Models

X Xu, Z Wu, R Qiao, A Verma, Y Shu, J Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
This position paper proposes a data-centric viewpoint of AI research, focusing on large
language models (LLMs). We start by making the key observation that data is instrumental in …

Position Paper: Data-Centric AI in the Age of Large Language Models

X Xu, Z Wu, R Qiao, A Verma, Y Shu… - Findings of the …, 2024 - aclanthology.org
This position paper proposes a data-centric viewpoint of AI research, focusing on large
language models (LLMs). We start by making a key observation that data is instrumental in …

How Language, Culture, and Geography shape Online Dialogue: Insights from Koo

A Mekacher, M Falkenberg, A Baronchelli - arXiv preprint arXiv …, 2024 - arxiv.org
Koo is a microblogging platform based in India launched in 2020 with the explicit aim of
catering to non-Western communities in their vernacular languages. With a near-complete …

Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs

AE Mekki, M Abdul-Mageed - arXiv preprint arXiv:2410.11006, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive performance on a wide
range of natural language processing (NLP) tasks, primarily through in-context learning …