Ai4bharat-indicnlp corpus: Monolingual corpora and word embeddings for indic languages

T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow… - 2023 - inria.hal.science

Large language models (LLMs) have been shown to be able to perform new tasks based on
a few demonstrations or natural language instructions. While these capabilities have led to …

被引用次数：1444 相关文章所有 16 个版本

[PDF] neurips.cc

The bigscience roots corpus: A 1.6 tb composite multilingual dataset

H Laurençon, L Saulnier, T Wang… - Advances in …, 2022 - proceedings.neurips.cc

As language models grow ever larger, the need for large-scale high-quality text datasets has
never been more pressing, especially in multilingual settings. The BigScience workshop, a 1 …

被引用次数：143 相关文章所有 21 个版本

A comprehensive survey on various fully automatic machine translation evaluation metrics

S Chauhan, P Daniel - Neural Processing Letters, 2023 - Springer

The fast advancement in machine translation models necessitates the development of
accurate evaluation metrics that would allow researchers to track the progress in text …

被引用次数：34 相关文章所有 2 个版本

[PDF] arxiv.org

A review of bangla natural language processing tasks and the utility of transformer models

F Alam, A Hasan, T Alam, A Khan, J Tajrin… - arXiv preprint arXiv …, 2021 - arxiv.org

Bangla--ranked as the 6th most widely spoken language across the world (https://www.
ethnologue. com/guides/ethnologue200), with 230 million native speakers--is still …

被引用次数：22 相关文章所有 4 个版本

[PDF] arxiv.org

Re-contextualizing fairness in NLP: The case of India

S Bhatt, S Dev, P Talukdar, S Dave… - arXiv preprint arXiv …, 2022 - arxiv.org

Recent research has revealed undesirable biases in NLP data and models. However, these
efforts focus on social disparities in West, and are not directly portable to other geo-cultural …

被引用次数：33 相关文章所有 5 个版本

[PDF] arxiv.org

inltk: Natural language toolkit for indic languages

G Arora - arXiv preprint arXiv:2009.12534, 2020 - arxiv.org

We present iNLTK, an open-source NLP library consisting of pre-trained language models
and out-of-the-box support for Data Augmentation, Textual Similarity, Sentence …

被引用次数：74 相关文章所有 3 个版本

[PDF] mmu.ac.uk

Sentiment analysis using XLM-R transformer and zero-shot transfer learning on resource-poor indian language

A Kumar, VHC Albuquerque - Transactions on Asian and Low-Resource …, 2021 - dl.acm.org

Sentiment analysis on social media relies on comprehending the natural language and
using a robust machine learning technique that learns multiple layers of representations or …

被引用次数：56 相关文章所有 3 个版本

Hottest: Hate and offensive content identification in Tamil using transformers and enhanced stemming

R Rajalakshmi, S Selvaraj, P Vasudevan - Computer Speech & …, 2023 - Elsevier

Offensive content or hate speech is defined as any form of communication that aims to
annoy, harass, disturb, or anger an individual or community based on factors such as faith …

被引用次数：24 相关文章所有 2 个版本

[PDF] springer.com

Fighting hate speech from bilingual hinglish speaker's perspective, a transformer-and translation-based approach.

S Biradar, S Saumya, A Chauhan - Social Network Analysis and Mining, 2022 - Springer

Many people have begun to use social media platforms due to the increased use of the
Internet over the previous decade. It has a lot of benefits, but it also comes with a lot of risks …

被引用次数：20 相关文章所有 5 个版本

[PDF] arxiv.org

Indic-transformers: An analysis of transformer language models for Indian languages

K Jain, A Deshpande, K Shridhar, F Laumann… - arXiv preprint arXiv …, 2020 - arxiv.org

Language models based on the Transformer architecture have achieved state-of-the-art
performance on a wide range of NLP tasks such as text classification, question-answering …

被引用次数：43 相关文章所有 4 个版本