Blocking and filtering techniques for entity resolution: A survey
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
Human-in-the-loop data integration
G Li - Proceedings of the VLDB Endowment, 2017 - dl.acm.org
Data integration aims to integrate data in different sources and provide users with a unified
view. However, data integration cannot be completely addressed by purely automated …
view. However, data integration cannot be completely addressed by purely automated …
Learned cardinality estimation for similarity queries
In this paper, we study the problem of using deep neural networks (DNNs) for estimating the
cardinality of similarity queries. Intuitively, DNNs can capture the distribution of data points …
cardinality of similarity queries. Intuitively, DNNs can capture the distribution of data points …
Set similarity joins on mapreduce: An experimental survey
Set similarity joins, which compute pairs of similar sets, constitute an important operator
primitive in a variety of applications, including applications that must process large amounts …
primitive in a variety of applications, including applications that must process large amounts …
Optimizing in-memory database engine for AI-powered on-line decision augmentation using persistent memory
On-line decision augmentation (OLDA) has been considered as a promising paradigm for
real-time decision making powered by Artificial Intelligence (AI). OLDA has been widely …
real-time decision making powered by Artificial Intelligence (AI). OLDA has been widely …
A survey of blocking and filtering techniques for entity resolution
Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this
survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid …
survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid …
Similarity query support in big data management systems
T Kim, W Li, A Behm, I Cetindil, R Vernica, V Borkar… - Information Systems, 2020 - Elsevier
Similarity query processing is becoming increasingly important in many applications such as
data cleaning, record linkage, Web search, and document analytics. In this paper we study …
data cleaning, record linkage, Web search, and document analytics. In this paper we study …
[PDF][PDF] Smurf: Self-service string matching using random forests
We argue that more attention should be devoted to developing self-service string matching
(SM) solutions, which lay users can easily use. We show that Falcon, a self-service entity …
(SM) solutions, which lay users can easily use. We show that Falcon, a self-service entity …
Lcjoin: Set containment join via list crosscutting
A set containment join operates on two set-valued attributes with a subset (⊆) relationship
as the join condition. It has many real-world applications, such as in publish/subscribe …
as the join condition. It has many real-world applications, such as in publish/subscribe …
Personalized query recommendation system: A genetic algorithm approach
D Barman, R Sarkar, A Tudu… - Journal of …, 2020 - Taylor & Francis
Search engine has become an integral part of our daily life. It helps users to find a specific
desirable information from the large amount of data stored in the web. Query …
desirable information from the large amount of data stored in the web. Query …