[HTML][HTML] Snorkel: Rapid training data creation with weak supervision
Labeling training data is increasingly the largest bottleneck in deploying machine learning
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …
[HTML][HTML] Snorkel: rapid training data creation with weak supervision
Labeling training data is increasingly the largest bottleneck in deploying machine learning
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …
A survey on truth discovery
Thanks to information explosion, data for the objects of interest can be collected from
increasingly more sources. However, for the same object, there usually exist conflicts among …
increasingly more sources. However, for the same object, there usually exist conflicts among …
Big data integration
XL Dong, D Srivastava - 2013 IEEE 29th international …, 2013 - ieeexplore.ieee.org
The Big Data era is upon us: data is being generated, collected and analyzed at an
unprecedented scale, and data-driven decision making is sweeping through all aspects of …
unprecedented scale, and data-driven decision making is sweeping through all aspects of …
Debugging inputs
L Kirschner, E Soremekun, A Zeller - Proceedings of the ACM/IEEE 42nd …, 2020 - dl.acm.org
When a program fails to process an input, it need not be the program code that is at fault. It
can also be that the input data is faulty, for instance as result of data corruption. To get the …
can also be that the input data is faulty, for instance as result of data corruption. To get the …
[HTML][HTML] Snuba: Automating weak supervision to label training data
As deep learning models are applied to increasingly diverse problems, a key bottleneck is
gathering enough high-quality training labels tailored to each task. Users therefore turn to …
gathering enough high-quality training labels tailored to each task. Users therefore turn to …
Knowledge-based trust: Estimating the trustworthiness of web sources
The quality of web sources has been traditionally evaluated using exogenous signals such
as the hyperlink structure of the graph. We propose a new approach that relies on …
as the hyperlink structure of the graph. We propose a new approach that relies on …
Truth discovery algorithms: An experimental evaluation
DA Waguih, L Berti-Equille - arXiv preprint arXiv:1409.6428, 2014 - arxiv.org
A fundamental problem in data fusion is to determine the veracity of multi-source data in
order to resolve conflicts. While previous work in truth discovery has proved to be useful in …
order to resolve conflicts. While previous work in truth discovery has proved to be useful in …
From data fusion to knowledge fusion
The task of {\em data fusion} is to identify the true values of data items (eg, the true date of
birth for {\em Tom Cruise}) among multiple observed values drawn from different sources …
birth for {\em Tom Cruise}) among multiple observed values drawn from different sources …
QASCA: A quality-aware task assignment system for crowdsourcing applications
A crowdsourcing system, such as the Amazon Mechanical Turk (AMT), provides a platform
for a large number of questions to be answered by Internet workers. Such systems have …
for a large number of questions to be answered by Internet workers. Such systems have …