An overview of end-to-end entity resolution for big data

V Christophides, V Efthymiou, T Palpanas… - ACM Computing …, 2020 - dl.acm.org
One of the most critical tasks for improving data quality and increasing the reliability of data
analytics is Entity Resolution (ER), which aims to identify different descriptions that refer to …

Blocking and filtering techniques for entity resolution: A survey

G Papadakis, D Skoutas, E Thanos… - ACM Computing Surveys …, 2020 - dl.acm.org
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

Crowdsourced data management: A survey

G Li, J Wang, Y Zheng… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Any important data management and analytics tasks cannot be completely addressed by
automated processes. These tasks, such as entity resolution, sentiment analysis, and image …

Josie: Overlap set similarity search for finding joinable tables in data lakes

E Zhu, D Deng, F Nargesian, RJ Miller - Proceedings of the 2019 …, 2019 - dl.acm.org
We present a new solution for finding joinable tables in massive data lakes: given a table
and one join column, find tables that can be joined with the given table on the largest …

QASCA: A quality-aware task assignment system for crowdsourcing applications

Y Zheng, J Wang, G Li, R Cheng, J Feng - Proceedings of the 2015 ACM …, 2015 - dl.acm.org
A crowdsourcing system, such as the Amazon Mechanical Turk (AMT), provides a platform
for a large number of questions to be answered by Internet workers. Such systems have …

Question answering over knowledge graphs: question understanding via template decomposition

W Zheng, JX Yu, L Zou, H Cheng - Proceedings of the VLDB Endowment, 2018 - dl.acm.org
The gap between unstructured natural language and structured data makes it challenging to
build a system that supports using natural language to query large knowledge graphs. Many …

An empirical evaluation of set similarity join techniques

W Mann, N Augsten, P Bouros - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
Set similarity joins compute all pairs of similar sets from two collections of sets. We conduct
extensive experiments on seven state-of-the-art algorithms for set similarity joins. These …

String similarity search and join: a survey

M Yu, G Li, D Deng, J Feng - Frontiers of Computer Science, 2016 - Springer
String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …

String similarity joins: An experimental evaluation

Y Jiang, G Li, J Feng, WS Li - Proceedings of the VLDB Endowment, 2014 - dl.acm.org
String similarity join is an important operation in data integration and cleansing that finds
similar string pairs from two collections of strings. More than ten algorithms have been …