Blocking and filtering techniques for entity resolution: A survey

G Papadakis, D Skoutas, E Thanos… - ACM Computing Surveys …, 2020 - dl.acm.org
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …

String similarity search and join: a survey

M Yu, G Li, D Deng, J Feng - Frontiers of Computer Science, 2016 - Springer
String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …

A survey of indexing techniques for scalable record linkage and deduplication

P Christen - IEEE transactions on knowledge and data …, 2011 - ieeexplore.ieee.org
Record linkage is the process of matching records from several databases that refer to the
same entities. When applied on a single database, this process is known as deduplication …

Efficient similarity joins for near-duplicate detection

C Xiao, W Wang, X Lin, JX Yu, G Wang - ACM Transactions on Database …, 2011 - dl.acm.org
With the increasing amount of data and the need to integrate data from multiple data
sources, one of the challenging issues is to identify near-duplicate records efficiently. In this …

Efficient parallel set-similarity joins using mapreduce

R Vernica, MJ Carey, C Li - Proceedings of the 2010 ACM SIGMOD …, 2010 - dl.acm.org
In this paper we study how to efficiently perform set-similarity joins in parallel using the
popular MapReduce framework. We propose a 3-stage approach for end-to-end set …

Frameworks for entity matching: A comparison

H Köpcke, E Rahm - Data & Knowledge Engineering, 2010 - Elsevier
Entity matching is a crucial and difficult task for data integration. Entity matching frameworks
provide several methods and their combination to effectively solve different match tasks. In …

Modern privacy-preserving record linkage techniques: An overview

A Gkoulalas-Divanis, D Vatsalan… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
Record linkage is the challenging task of deciding which records, coming from disparate
data sources, refer to the same entity. Established back in 1946 by Halbert L. Dunn, the area …

Fast-join: An efficient method for fuzzy token matching based string similarity join

J Wang, G Li, J Fe - 2011 IEEE 27th International Conference …, 2011 - ieeexplore.ieee.org
String similarity join that finds similar string pairs between two string sets is an essential
operation in many applications, and has attracted significant attention recently in the …

Falcon: Scaling up hands-off crowdsourced entity matching to build cloud services

S Das, PS GC, AH Doan, JF Naughton… - Proceedings of the …, 2017 - dl.acm.org
Many works have applied crowdsourcing to entity matching (EM). While promising, these
approaches are limited in that they often require a developer to be in the loop. As such, it is …

Three-dimensional entity resolution with JedAI

G Papadakis, G Mandilaras, L Gagliardelli… - Information Systems, 2020 - Elsevier
Entity Resolution (ER) is the task of detecting different entity profiles that describe the same
real-world objects. To facilitate its execution, we have developed JedAI, an open-source …