Blocking and filtering techniques for entity resolution: A survey

G Papadakis, D Skoutas, E Thanos… - ACM Computing Surveys …, 2020 - dl.acm.org
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …

Human-in-the-loop data integration

G Li - Proceedings of the VLDB Endowment, 2017 - dl.acm.org
Data integration aims to integrate data in different sources and provide users with a unified
view. However, data integration cannot be completely addressed by purely automated …

Learned cardinality estimation for similarity queries

J Sun, G Li, N Tang - Proceedings of the 2021 International Conference …, 2021 - dl.acm.org
In this paper, we study the problem of using deep neural networks (DNNs) for estimating the
cardinality of similarity queries. Intuitively, DNNs can capture the distribution of data points …

Set similarity joins on mapreduce: An experimental survey

F Fier, N Augsten, P Bouros, U Leser… - Proceedings of the VLDB …, 2018 - dl.acm.org
Set similarity joins, which compute pairs of similar sets, constitute an important operator
primitive in a variety of applications, including applications that must process large amounts …

Optimizing in-memory database engine for AI-powered on-line decision augmentation using persistent memory

C Chen, J Yang, M Lu, T Wang, Z Zheng… - Proceedings of the …, 2021 - dl.acm.org
On-line decision augmentation (OLDA) has been considered as a promising paradigm for
real-time decision making powered by Artificial Intelligence (AI). OLDA has been widely …

A survey of blocking and filtering techniques for entity resolution

G Papadakis, D Skoutas, E Thanos… - arXiv preprint arXiv …, 2019 - arxiv.org
Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this
survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid …

Similarity query support in big data management systems

T Kim, W Li, A Behm, I Cetindil, R Vernica, V Borkar… - Information Systems, 2020 - Elsevier
Similarity query processing is becoming increasingly important in many applications such as
data cleaning, record linkage, Web search, and document analytics. In this paper we study …

[PDF][PDF] Smurf: Self-service string matching using random forests

GC Paul Suganthan, A Ardalan, AH Doan… - Proc. VLDB …, 2018 - pages.cs.wisc.edu
We argue that more attention should be devoted to developing self-service string matching
(SM) solutions, which lay users can easily use. We show that Falcon, a self-service entity …

Lcjoin: Set containment join via list crosscutting

D Deng, C Yang, S Shang, F Zhu… - 2019 IEEE 35th …, 2019 - ieeexplore.ieee.org
A set containment join operates on two set-valued attributes with a subset (⊆) relationship
as the join condition. It has many real-world applications, such as in publish/subscribe …

Personalized query recommendation system: A genetic algorithm approach

D Barman, R Sarkar, A Tudu… - Journal of …, 2020 - Taylor & Francis
Search engine has become an integral part of our daily life. It helps users to find a specific
desirable information from the large amount of data stored in the web. Query …