A survey on task assignment in crowdsourcing

D Hettiachchi, V Kostakos, J Goncalves - ACM Computing Surveys …, 2022 - dl.acm.org
Quality improvement methods are essential to gathering high-quality crowdsourced data,
both for research and industry applications. A popular and broadly applicable method is task …

Crowdsourced data management: Overview and challenges

G Li, Y Zheng, J Fan, J Wang, R Cheng - Proceedings of the 2017 ACM …, 2017 - dl.acm.org
Many important data management and analytics tasks cannot be completely addressed by
automated processes. Crowdsourcing is an effective way to harness human cognitive …

[HTML][HTML] Snorkel: Rapid training data creation with weak supervision

A Ratner, SH Bach, H Ehrenberg, J Fries… - Proceedings of the …, 2017 - ncbi.nlm.nih.gov
Labeling training data is increasingly the largest bottleneck in deploying machine learning
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …

Snorkel: rapid training data creation with weak supervision

A Ratner, SH Bach, H Ehrenberg, J Fries, S Wu, C Ré - The VLDB Journal, 2020 - Springer
Labeling training data is increasingly the largest bottleneck in deploying machine learning
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …

Data programming: Creating large training sets, quickly

AJ Ratner, CM De Sa, S Wu… - Advances in neural …, 2016 - proceedings.neurips.cc
Large labeled training sets are the critical building blocks of supervised learning methods
and are key enablers of deep learning techniques. For some applications, creating labeled …

Crowdsourced data management: A survey

G Li, J Wang, Y Zheng… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Any important data management and analytics tasks cannot be completely addressed by
automated processes. These tasks, such as entity resolution, sentiment analysis, and image …

[HTML][HTML] Snuba: Automating weak supervision to label training data

P Varma, C Ré - … of the VLDB Endowment. International Conference …, 2018 - ncbi.nlm.nih.gov
As deep learning models are applied to increasingly diverse problems, a key bottleneck is
gathering enough high-quality training labels tailored to each task. Users therefore turn to …

Learning the structure of generative models without labeled data

SH Bach, B He, A Ratner, C Ré - … Conference on Machine …, 2017 - proceedings.mlr.press
Curating labeled training data has become the primary bottleneck in machine learning.
Recent frameworks address this bottleneck with generative models to synthesize labels at …

Challenges in data crowdsourcing

H Garcia-Molina, M Joglekar, A Marcus… - … on Knowledge and …, 2016 - ieeexplore.ieee.org
Crowdsourcing refers to solving large problems by involving human workers that solve
component sub-problems or tasks. In data crowdsourcing, the problem involves data …

Denoising multi-source weak supervision for neural text classification

W Ren, Y Li, H Su, D Kartchner, C Mitchell… - arXiv preprint arXiv …, 2020 - arxiv.org
We study the problem of learning neural text classifiers without using any labeled data, but
only easy-to-provide rules as multiple weak supervision sources. This problem is …