Neural networks for entity matching: A survey
Entity matching is the problem of identifying which records refer to the same real-world
entity. It has been actively researched for decades, and a variety of different approaches …
entity. It has been actively researched for decades, and a variety of different approaches …
Human-in-the-loop data integration
G Li - Proceedings of the VLDB Endowment, 2017 - dl.acm.org
Data integration aims to integrate data in different sources and provide users with a unified
view. However, data integration cannot be completely addressed by purely automated …
view. However, data integration cannot be completely addressed by purely automated …
An approach to extracting complex knowledge patterns among concepts belonging to structured, semi-structured and unstructured sources in a data lake
In this paper, we propose a new network-based model to uniformly represent the structured,
semi-structured and unstructured sources of a data lake, which is one of the newest and …
semi-structured and unstructured sources of a data lake, which is one of the newest and …
Improving question answering over incomplete knowledge graphs with relation prediction
F Zhao, Y Li, J Hou, L Bai - Neural Computing and Applications, 2022 - Springer
Large-scale knowledge graphs (KGs) play a critical role in question answering over KGs
(KGs-QA). Despite of large scale, KGs suffer from incompleteness, which has fueled a lot of …
(KGs-QA). Despite of large scale, KGs suffer from incompleteness, which has fueled a lot of …
A novel cost-based model for data repairing
Integrity constraint based data repairing is an iterative process consisting of two parts: detect
and group errors that violate given integrity constraints (ICs); and modify values inside each …
and group errors that violate given integrity constraints (ICs); and modify values inside each …
JEDI: These aren't the JSON documents you're looking for...
The JavaScript Object Notation (JSON) is a popular data format used in document stores to
natively support semi-structured data. In this paper, we address the problem of JSON …
natively support semi-structured data. In this paper, we address the problem of JSON …
Privacy preserving similarity joins using MapReduce
Similarity join is an essential operator in data processing, mining and analysis. However, it is
resource intensive and time consuming, particularly when processing big data. There is also …
resource intensive and time consuming, particularly when processing big data. There is also …
Balance-aware distributed string similarity-based query processing system
Data analysts spend more than 80% of time on data cleaning and integration in the whole
process of data analytics due to data errors and inconsistencies. Similarity-based query …
process of data analytics due to data errors and inconsistencies. Similarity-based query …
Towards a unified framework for string similarity joins
A similarity join aims to find all similar pairs between two collections of records. Established
algorithms utilise different similarity measures, either syntactic or semantic, to quantify the …
algorithms utilise different similarity measures, either syntactic or semantic, to quantify the …
Cache-oblivious high-performance similarity join
A similarity join combines vectors based on a distance condition. Typically, such algorithms
apply a filter step (by indexing or sorting) and then refine pairs of candidate vectors. In this …
apply a filter step (by indexing or sorting) and then refine pairs of candidate vectors. In this …