Bloom: A 176b-parameter open-access multilingual language model

T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow… - 2023 - inria.hal.science
Large language models (LLMs) have been shown to be able to perform new tasks based on
a few demonstrations or natural language instructions. While these capabilities have led to …

The bigscience roots corpus: A 1.6 tb composite multilingual dataset

H Laurençon, L Saulnier, T Wang… - Advances in …, 2022 - proceedings.neurips.cc
As language models grow ever larger, the need for large-scale high-quality text datasets has
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …

A comprehensive survey on various fully automatic machine translation evaluation metrics

S Chauhan, P Daniel - Neural Processing Letters, 2023 - Springer
The fast advancement in machine translation models necessitates the development of
accurate evaluation metrics that would allow researchers to track the progress in text …

A review of bangla natural language processing tasks and the utility of transformer models

F Alam, A Hasan, T Alam, A Khan, J Tajrin… - arXiv preprint arXiv …, 2021 - arxiv.org
Bangla--ranked as the 6th most widely spoken language across the world (https://www.
ethnologue. com/guides/ethnologue200), with 230 million native speakers--is still …

Re-contextualizing fairness in NLP: The case of India

S Bhatt, S Dev, P Talukdar, S Dave… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent research has revealed undesirable biases in NLP data and models. However, these
efforts focus on social disparities in West, and are not directly portable to other geo-cultural …

inltk: Natural language toolkit for indic languages

G Arora - arXiv preprint arXiv:2009.12534, 2020 - arxiv.org
We present iNLTK, an open-source NLP library consisting of pre-trained language models
and out-of-the-box support for Data Augmentation, Textual Similarity, Sentence …

Sentiment analysis using XLM-R transformer and zero-shot transfer learning on resource-poor indian language

A Kumar, VHC Albuquerque - Transactions on Asian and Low-Resource …, 2021 - dl.acm.org
Sentiment analysis on social media relies on comprehending the natural language and
using a robust machine learning technique that learns multiple layers of representations or …

Hottest: Hate and offensive content identification in Tamil using transformers and enhanced stemming

R Rajalakshmi, S Selvaraj, P Vasudevan - Computer Speech & …, 2023 - Elsevier
Offensive content or hate speech is defined as any form of communication that aims to
annoy, harass, disturb, or anger an individual or community based on factors such as faith …

Fighting hate speech from bilingual hinglish speaker's perspective, a transformer-and translation-based approach.

S Biradar, S Saumya, A Chauhan - Social Network Analysis and Mining, 2022 - Springer
Many people have begun to use social media platforms due to the increased use of the
Internet over the previous decade. It has a lot of benefits, but it also comes with a lot of risks …

Indic-transformers: An analysis of transformer language models for Indian languages

K Jain, A Deshpande, K Shridhar, F Laumann… - arXiv preprint arXiv …, 2020 - arxiv.org
Language models based on the Transformer architecture have achieved state-of-the-art
performance on a wide range of NLP tasks such as text classification, question-answering …