[HTML][HTML] Snorkel: Rapid training data creation with weak supervision

A Ratner, SH Bach, H Ehrenberg, J Fries… - Proceedings of the …, 2017 - ncbi.nlm.nih.gov
Labeling training data is increasingly the largest bottleneck in deploying machine learning
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …

[HTML][HTML] Snorkel: rapid training data creation with weak supervision

A Ratner, SH Bach, H Ehrenberg, J Fries, S Wu, C Ré - The VLDB Journal, 2020 - Springer
Labeling training data is increasingly the largest bottleneck in deploying machine learning
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …

A survey on truth discovery

Y Li, J Gao, C Meng, Q Li, L Su, B Zhao… - ACM Sigkdd …, 2016 - dl.acm.org
Thanks to information explosion, data for the objects of interest can be collected from
increasingly more sources. However, for the same object, there usually exist conflicts among …

Big data integration

XL Dong, D Srivastava - 2013 IEEE 29th international …, 2013 - ieeexplore.ieee.org
The Big Data era is upon us: data is being generated, collected and analyzed at an
unprecedented scale, and data-driven decision making is sweeping through all aspects of …

Debugging inputs

L Kirschner, E Soremekun, A Zeller - Proceedings of the ACM/IEEE 42nd …, 2020 - dl.acm.org
When a program fails to process an input, it need not be the program code that is at fault. It
can also be that the input data is faulty, for instance as result of data corruption. To get the …

[HTML][HTML] Snuba: Automating weak supervision to label training data

P Varma, C Ré - … of the VLDB Endowment. International Conference …, 2018 - ncbi.nlm.nih.gov
As deep learning models are applied to increasingly diverse problems, a key bottleneck is
gathering enough high-quality training labels tailored to each task. Users therefore turn to …

Knowledge-based trust: Estimating the trustworthiness of web sources

XL Dong, E Gabrilovich, K Murphy, V Dang… - arXiv preprint arXiv …, 2015 - arxiv.org
The quality of web sources has been traditionally evaluated using exogenous signals such
as the hyperlink structure of the graph. We propose a new approach that relies on …

Truth discovery algorithms: An experimental evaluation

DA Waguih, L Berti-Equille - arXiv preprint arXiv:1409.6428, 2014 - arxiv.org
A fundamental problem in data fusion is to determine the veracity of multi-source data in
order to resolve conflicts. While previous work in truth discovery has proved to be useful in …

From data fusion to knowledge fusion

XL Dong, E Gabrilovich, G Heitz, W Horn… - arXiv preprint arXiv …, 2015 - arxiv.org
The task of {\em data fusion} is to identify the true values of data items (eg, the true date of
birth for {\em Tom Cruise}) among multiple observed values drawn from different sources …

QASCA: A quality-aware task assignment system for crowdsourcing applications

Y Zheng, J Wang, G Li, R Cheng, J Feng - Proceedings of the 2015 ACM …, 2015 - dl.acm.org
A crowdsourcing system, such as the Amazon Mechanical Turk (AMT), provides a platform
for a large number of questions to be answered by Internet workers. Such systems have …