Large language models for data annotation: A survey

Z Tan, A Beigi, S Wang, R Guo, A Bhattacharjee… - arXiv preprint arXiv …, 2024 - arxiv.org
Data annotation is the labeling or tagging of raw data with relevant information, essential for
improving the efficacy of machine learning models. The process, however, is labor-intensive …

Active learning by acquiring contrastive examples

K Margatina, G Vernikos, L Barrault… - arXiv preprint arXiv …, 2021 - arxiv.org
Common acquisition functions for active learning use either uncertainty or diversity
sampling, aiming to select difficult and diverse data points from the pool of unlabeled data …

Uncertainty in natural language processing: Sources, quantification, and applications

M Hu, Z Zhang, S Zhao, M Huang, B Wu - arXiv preprint arXiv:2306.04459, 2023 - arxiv.org
As a main field of artificial intelligence, natural language processing (NLP) has achieved
remarkable success via deep neural networks. Plenty of NLP tasks have been addressed in …

A survey of active learning for natural language processing

Z Zhang, E Strubell, E Hovy - arXiv preprint arXiv:2210.10109, 2022 - arxiv.org
In this work, we provide a survey of active learning (AL) for its applications in natural
language processing (NLP). In addition to a fine-grained categorization of query strategies …

Interactive natural language processing

Z Wang, G Zhang, K Yang, N Shi, W Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within
the field of NLP, aimed at addressing limitations in existing frameworks while aligning with …

[HTML][HTML] Medical image captioning via generative pretrained transformers

A Selivanov, OY Rogov, D Chesakov, A Shelmanov… - Scientific Reports, 2023 - nature.com
The proposed model for automatic clinical image caption generation combines the analysis
of radiological scans with structured patient information from the textual records. It uses two …

Active learning helps pretrained models learn the intended task

A Tamkin, D Nguyen, S Deshpande… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Models can fail in unpredictable ways during deployment due to task ambiguity,
when multiple behaviors are consistent with the provided training data. An example is an …

Quantifying aleatoric and epistemic uncertainty in machine learning: Are conditional entropy and mutual information appropriate measures?

L Wimmer, Y Sale, P Hofman, B Bischl… - Uncertainty in …, 2023 - proceedings.mlr.press
The quantification of aleatoric and epistemic uncertainty in terms of conditional entropy and
mutual information, respectively, has recently become quite common in machine learning …

How certain is your Transformer?

A Shelmanov, E Tsymbalov, D Puzyrev… - Proceedings of the …, 2021 - aclanthology.org
In this work, we consider the problem of uncertainty estimation for Transformer-based
models. We investigate the applicability of uncertainty estimates based on dropout usage at …

Active learning for abstractive text summarization

A Tsvigun, I Lysenko, D Sedashov, I Lazichny… - arXiv preprint arXiv …, 2023 - arxiv.org
Construction of human-curated annotated datasets for abstractive text summarization (ATS)
is very time-consuming and expensive because creating each instance requires a human …