A survey on task assignment in crowdsourcing
Quality improvement methods are essential to gathering high-quality crowdsourced data,
both for research and industry applications. A popular and broadly applicable method is task …
both for research and industry applications. A popular and broadly applicable method is task …
Crowdsourced data management: Overview and challenges
Many important data management and analytics tasks cannot be completely addressed by
automated processes. Crowdsourcing is an effective way to harness human cognitive …
automated processes. Crowdsourcing is an effective way to harness human cognitive …
[HTML][HTML] Snorkel: Rapid training data creation with weak supervision
Labeling training data is increasingly the largest bottleneck in deploying machine learning
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …
Snorkel: rapid training data creation with weak supervision
Labeling training data is increasingly the largest bottleneck in deploying machine learning
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …
Data programming: Creating large training sets, quickly
Large labeled training sets are the critical building blocks of supervised learning methods
and are key enablers of deep learning techniques. For some applications, creating labeled …
and are key enablers of deep learning techniques. For some applications, creating labeled …
Crowdsourced data management: A survey
Any important data management and analytics tasks cannot be completely addressed by
automated processes. These tasks, such as entity resolution, sentiment analysis, and image …
automated processes. These tasks, such as entity resolution, sentiment analysis, and image …
[HTML][HTML] Snuba: Automating weak supervision to label training data
As deep learning models are applied to increasingly diverse problems, a key bottleneck is
gathering enough high-quality training labels tailored to each task. Users therefore turn to …
gathering enough high-quality training labels tailored to each task. Users therefore turn to …
Learning the structure of generative models without labeled data
Curating labeled training data has become the primary bottleneck in machine learning.
Recent frameworks address this bottleneck with generative models to synthesize labels at …
Recent frameworks address this bottleneck with generative models to synthesize labels at …
Challenges in data crowdsourcing
Crowdsourcing refers to solving large problems by involving human workers that solve
component sub-problems or tasks. In data crowdsourcing, the problem involves data …
component sub-problems or tasks. In data crowdsourcing, the problem involves data …
Denoising multi-source weak supervision for neural text classification
We study the problem of learning neural text classifiers without using any labeled data, but
only easy-to-provide rules as multiple weak supervision sources. This problem is …
only easy-to-provide rules as multiple weak supervision sources. This problem is …