Neural networks for entity matching: A survey

N Barlaug, JA Gulla - ACM Transactions on Knowledge Discovery from …, 2021 - dl.acm.org
Entity matching is the problem of identifying which records refer to the same real-world
entity. It has been actively researched for decades, and a variety of different approaches …

Human-in-the-loop data integration

G Li - Proceedings of the VLDB Endowment, 2017 - dl.acm.org
Data integration aims to integrate data in different sources and provide users with a unified
view. However, data integration cannot be completely addressed by purely automated …

An approach to extracting complex knowledge patterns among concepts belonging to structured, semi-structured and unstructured sources in a data lake

PL Giudice, L Musarella, G Sofo, D Ursino - Information Sciences, 2019 - Elsevier
In this paper, we propose a new network-based model to uniformly represent the structured,
semi-structured and unstructured sources of a data lake, which is one of the newest and …

Improving question answering over incomplete knowledge graphs with relation prediction

F Zhao, Y Li, J Hou, L Bai - Neural Computing and Applications, 2022 - Springer
Large-scale knowledge graphs (KGs) play a critical role in question answering over KGs
(KGs-QA). Despite of large scale, KGs suffer from incompleteness, which has fueled a lot of …

A novel cost-based model for data repairing

S Hao, N Tang, G Li, J He, N Ta… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Integrity constraint based data repairing is an iterative process consisting of two parts: detect
and group errors that violate given integrity constraints (ICs); and modify values inside each …

JEDI: These aren't the JSON documents you're looking for...

T Hütter, N Augsten, CM Kirsch, MJ Carey… - Proceedings of the 2022 …, 2022 - dl.acm.org
The JavaScript Object Notation (JSON) is a popular data format used in document stores to
natively support semi-structured data. In this paper, we address the problem of JSON …

Privacy preserving similarity joins using MapReduce

X Ding, W Yang, KKR Choo, X Wang, H Jin - Information Sciences, 2019 - Elsevier
Similarity join is an essential operator in data processing, mining and analysis. However, it is
resource intensive and time consuming, particularly when processing big data. There is also …

Balance-aware distributed string similarity-based query processing system

J Sun, Z Shang, G Li, D Deng, Z Bao - Proceedings of the VLDB …, 2019 - dl.acm.org
Data analysts spend more than 80% of time on data cleaning and integration in the whole
process of data analytics due to data errors and inconsistencies. Similarity-based query …

Towards a unified framework for string similarity joins

P Xu, J Lu - Proceedings of the VLDB Endowment, 2019 - researchportal.helsinki.fi
A similarity join aims to find all similar pairs between two collections of records. Established
algorithms utilise different similarity measures, either syntactic or semantic, to quantify the …

Cache-oblivious high-performance similarity join

M Perdacher, C Plant, C Böhm - … of the 2019 International Conference on …, 2019 - dl.acm.org
A similarity join combines vectors based on a distance condition. Typically, such algorithms
apply a filter step (by indexing or sorting) and then refine pairs of candidate vectors. In this …