Blocking and filtering techniques for entity resolution: A survey

G Papadakis, D Skoutas, E Thanos… - ACM Computing Surveys …, 2020 - dl.acm.org
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …

Learned cardinality estimation for similarity queries

J Sun, G Li, N Tang - Proceedings of the 2021 International Conference …, 2021 - dl.acm.org
In this paper, we study the problem of using deep neural networks (DNNs) for estimating the
cardinality of similarity queries. Intuitively, DNNs can capture the distribution of data points …

Resource allocation in cloud computing using genetic algorithm and neural network

M Manavi, Y Zhang, G Chen - 2023 IEEE 8th International …, 2023 - ieeexplore.ieee.org
Cloud computing is one of the most used distributed systems for data processing and data
storage. Due to the continuous increase in the size of the data processed by cloud …

A survey of blocking and filtering techniques for entity resolution

G Papadakis, D Skoutas, E Thanos… - arXiv preprint arXiv …, 2019 - arxiv.org
Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this
survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid …

Balanced scheduling of distributed workflow tasks based on clustering

D Yu, Y Ying, L Zhang, C Liu, X Sun, H Zheng - Knowledge-Based Systems, 2020 - Elsevier
Distributed computing, such as Cloud, provides traditional workflow applications with
completely new deployment architecture offering high performance and scalability …

[HTML][HTML] Reasoning on property graphs with graph generating dependencies

LC Shimomura, N Yakovets, G Fletcher - Information Sciences, 2024 - Elsevier
Data dependencies are a key concept in data management and have been researched in
data integration, data quality and query optimization. With the increasing use of graph …

A Survey on Efficient Processing of Similarity Queries over Neural Embeddings

Y Wang - arXiv preprint arXiv:2204.07922, 2022 - arxiv.org
Similarity query is the family of queries based on some similarity metrics. Unlike the
traditional database queries which are mostly based on value equality, similarity queries aim …

Semi-stream similarity join processing in a distributed environment

HJ Kim, KH Lee - IEEE Access, 2020 - ieeexplore.ieee.org
Similarity join has become very important for semi-or un-structured data processing and
analysis. Although several studies have been conducted on the similarity join, little attention …

Crowdsourced collective entity resolution with relational match propagation

J Huang, W Hu, Z Bao, Y Qu - 2020 IEEE 36th International …, 2020 - ieeexplore.ieee.org
Knowledge bases (KBs) store rich yet heterogeneous entities and facts. Entity resolution
(ER) aims to identify entities in KBs which refer to the same real-world object. Recent studies …

Internal and external memory set containment join

C Yang, D Deng, S Shang, F Zhu, L Liu, L Shao - The VLDB Journal, 2021 - Springer
A set containment join operates on two set-valued attributes with a subset (⊆⊆) relationship
as the join condition. It has many real-world applications, such as in publish/subscribe …