On llms-driven synthetic data generation, curation, and evaluation: A survey

L Long, R Wang, R Xiao, J Zhao, X Ding… - arXiv preprint arXiv …, 2024 - arxiv.org
Within the evolving landscape of deep learning, the dilemma of data quantity and quality has
been a long-standing problem. The recent advent of Large Language Models (LLMs) offers …

A survey on programmatic weak supervision

J Zhang, CY Hsieh, Y Yu, C Zhang, A Ratner - arXiv preprint arXiv …, 2022 - arxiv.org
Labeling training data has become one of the major roadblocks to using machine learning.
Among various weak supervision paradigms, programmatic weak supervision (PWS) has …

Coannotating: Uncertainty-guided work allocation between human and large language models for data annotation

M Li, T Shi, C Ziems, MY Kan, NF Chen, Z Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Annotated data plays a critical role in Natural Language Processing (NLP) in training
models and evaluating their performance. Given recent developments in Large Language …

Language models in the loop: Incorporating prompting into weak supervision

R Smith, JA Fries, B Hancock, SH Bach - ACM/JMS Journal of Data …, 2024 - dl.acm.org
We propose a new strategy for applying large pre-trained language models to novel tasks
when labeled training data is limited. Rather than apply the model in a typical zero-shot or …

Understanding programmatic weak supervision via source-aware influence function

J Zhang, H Wang, CY Hsieh… - Advances in neural …, 2022 - proceedings.neurips.cc
Abstract Programmatic Weak Supervision (PWS) aggregates the source votes of multiple
weak supervision sources into probabilistic training labels, which are in turn used to train an …

Losses over labels: Weakly supervised learning via direct loss construction

D Sam, JZ Kolter - Proceedings of the AAAI conference on artificial …, 2023 - ojs.aaai.org
Owing to the prohibitive costs of generating large amounts of labeled data, programmatic
weak supervision is a growing paradigm within machine learning. In this setting, users …

Leveraging instance features for label aggregation in programmatic weak supervision

J Zhang, L Song, A Ratner - International Conference on …, 2023 - proceedings.mlr.press
Abstract Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to
synthesize training labels efficiently. The core component of PWS is the label model, which …

Robust weak supervision with variational auto-encoders

F Tonolini, N Aletras, Y Jiao… - … Conference on Machine …, 2023 - proceedings.mlr.press
Recent advances in weak supervision (WS) techniques allow to mitigate the enormous cost
and effort of human data annotation for supervised machine learning by automating it using …

How many validation labels do you need? exploring the design space of label-efficient model ranking

Z Hu, J Zhang, Y Yu, Y Zhuang, H Xiong - arXiv preprint arXiv:2312.01619, 2023 - arxiv.org
The paper introduces LEMR, a framework that reduces annotation costs for model selection
tasks. Our approach leverages ensemble methods to generate pseudo-labels, employs …

Cross-task Knowledge Transfer for Extremely Weakly Supervised Text Classification

S Park, K Kim, J Lee - Findings of the Association for …, 2023 - aclanthology.org
Text classification with extremely weak supervision (EWS) imposes stricter supervision
constraints compared to regular weakly supervise classification. Absolutely no labeled …