Findings of the 2021 conference on machine translation (WMT21)

F Akhbardeh, A Arkhangorodsky, M Biesialska… - Proceedings of the sixth …, 2021 - cris.fbk.eu
This paper presents the results of the news translation task, the multilingual low-resource
translation for Indo-European languages, the triangular translation task, and the automatic …

Data representativity for machine learning and AI systems

LH Clemmensen, RD Kjærsgaard - arXiv preprint arXiv:2203.04706, 2022 - arxiv.org
Data representativity is crucial when drawing inference from data through machine learning
models. Scholars have increased focus on unraveling the bias and fairness in models, also …

Ethical considerations for responsible data curation

J Andrews, D Zhao, W Thong… - Advances in …, 2024 - proceedings.neurips.cc
Human-centric computer vision (HCCV) data curation practices often neglect privacy and
bias concerns, leading to dataset retractions and unfair models. HCCV datasets constructed …

RuCoLA: Russian corpus of linguistic acceptability

V Mikhailov, T Shamardina, M Ryabinin… - arXiv preprint arXiv …, 2022 - arxiv.org
Linguistic acceptability (LA) attracts the attention of the research community due to its many
uses, such as testing the grammatical knowledge of language models and filtering …

TAPE: Assessing few-shot Russian language understanding

E Taktasheva, T Shavrina, A Fenogenova… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent advances in zero-shot and few-shot learning have shown promise for a scope of
research and practical purposes. However, this fast-growing area lacks standardized …

Deep learning based speech recognition for hyperkinetic dysarthria disorder

AM Hashan, CR Dmitrievich… - 2024 IEEE Ural …, 2024 - ieeexplore.ieee.org
Speech recognition is a technology that aims to transform human speech into text and has
applications in a variety of fields, including information technology, healthcare, automobiles …

[PDF][PDF] A general-purpose crowdsourcing computational quality control toolkit for Python

D Ustalov, N Pavlichenko, V Losev… - The Ninth AAAI …, 2021 - humancomputation.com
Quality control is a crux of crowdsourcing. While most means for quality control are
organizational and imply worker selection, golden tasks, and post-acceptance …

Learning from Crowds with Crowd-Kit

D Ustalov, N Pavlichenko, B Tseitlin - arXiv preprint arXiv:2109.08584, 2021 - arxiv.org
Quality control is a crux of crowdsourcing. While most means for quality control are
organizational and imply worker selection, golden tasks, and post-acceptance …

[PDF][PDF] Principlism Guided Responsible Data Curation

JTA Andrews, D Zhao, W Thong, A Modas… - arXiv preprint arXiv …, 2023 - dmlr.ai
Human-centric computer vision (HCCV) data curation practices often neglect privacy and
bias concerns, leading to dataset retractions and unfair models. Further, HCCV datasets …

Challenges in Data Production for AI with Human-in-the-Loop

D Ustalov - Proceedings of the Fifteenth ACM International …, 2022 - dl.acm.org
Today, successful Artificial Intelligence applications rely on three pillars: machine learning
algorithms, hardware for running them, and data for training and evaluating models …