Blocking and filtering techniques for entity resolution: A survey

G Papadakis, D Skoutas, E Thanos… - ACM Computing Surveys …, 2020 - dl.acm.org
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …

An empirical evaluation of set similarity join techniques

W Mann, N Augsten, P Bouros - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
Set similarity joins compute all pairs of similar sets from two collections of sets. We conduct
extensive experiments on seven state-of-the-art algorithms for set similarity joins. These …

Coupled attribute similarity learning on categorical data

C Wang, X Dong, F Zhou, L Cao… - IEEE transactions on …, 2014 - ieeexplore.ieee.org
Attribute independence has been taken as a major assumption in the limited research that
has been conducted on similarity analysis for categorical data, especially unsupervised …

Leveraging set relations in exact set similarity join

X Wang, L Qin, X Lin, Y Zhang… - Proceedings of the VLDB …, 2017 - opus.lib.uts.edu.au
© 2017 VLDB. Exact set similarity join, which finds all the similar set pairs from two
collections of sets, is a fundamental problem with a wide range of applications. The existing …

A survey of blocking and filtering techniques for entity resolution

G Papadakis, D Skoutas, E Thanos… - arXiv preprint arXiv …, 2019 - arxiv.org
Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this
survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid …

[HTML][HTML] Parallel set similarity join on big data based on locality-sensitive hashing

MK Sohrabi, H Azgomi - Science of computer programming, 2017 - Elsevier
Due to the huge amount of involved data and time-consuming process of join operations, the
exact-match joins are rarely used for big data. The most common alternative for exact-match …

Similarity query support in big data management systems

T Kim, W Li, A Behm, I Cetindil, R Vernica, V Borkar… - Information Systems, 2020 - Elsevier
Similarity query processing is becoming increasingly important in many applications such as
data cleaning, record linkage, Web search, and document analytics. In this paper we study …

[HTML][HTML] Data structure set-trie for storing and querying sets: Theoretical and empirical analysis

I Savnik, M Akulich, M Krnc, R Škrekovski - Plos one, 2021 - journals.plos.org
Set containment operations form an important tool in various fields such as information
retrieval, AI systems, object-relational databases, and Internet applications. In the paper, a …

[PDF][PDF] 集合和字符串的相似度查询

林学民, 王炜 - 2011 - cjc.ict.ac.cn
摘要相似度查询是计算机学科中一个重要的问题, 它的应用遍及多个领域, 例如数据库,
数据集成, 互联网, 数据挖掘以及生物信息学等. 该文主要讨论在集合和字符串上的相似度查询 …

[PDF][PDF] PEL: Position-Enhanced Length Filter for Set Similarity Joins.

W Mann, N Augsten - Grundlagen von Datenbanken, 2014 - Citeseer
Set similarity joins compute all pairs of similar sets from two collections of sets. Set similarity
joins are typically implemented in a filter-verify framework: a filter generates candidate pairs …