K-join: Knowledge-aware similarity join

N Barlaug, JA Gulla - ACM Transactions on Knowledge Discovery from …, 2021 - dl.acm.org

Entity matching is the problem of identifying which records refer to the same real-world
entity. It has been actively researched for decades, and a variety of different approaches …

被引用次数：143 相关文章所有 10 个版本

[PDF] tsinghua.edu.cn

Human-in-the-loop data integration

G Li - Proceedings of the VLDB Endowment, 2017 - dl.acm.org

Data integration aims to integrate data in different sources and provide users with a unified
view. However, data integration cannot be completely addressed by purely automated …

被引用次数：101 相关文章所有 6 个版本

[PDF] researchgate.net

An approach to extracting complex knowledge patterns among concepts belonging to structured, semi-structured and unstructured sources in a data lake

PL Giudice, L Musarella, G Sofo, D Ursino - Information Sciences, 2019 - Elsevier

In this paper, we propose a new network-based model to uniformly represent the structured,
semi-structured and unstructured sources of a data lake, which is one of the newest and …

被引用次数：60 相关文章所有 2 个版本

Improving question answering over incomplete knowledge graphs with relation prediction

F Zhao, Y Li, J Hou, L Bai - Neural Computing and Applications, 2022 - Springer

Large-scale knowledge graphs (KGs) play a critical role in question answering over KGs
(KGs-QA). Despite of large scale, KGs suffer from incompleteness, which has fueled a lot of …

被引用次数：18 相关文章所有 4 个版本

[PDF] qcri.org

A novel cost-based model for data repairing

S Hao, N Tang, G Li, J He, N Ta… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org

Integrity constraint based data repairing is an iterative process consisting of two parts: detect
and group errors that violate given integrity constraints (ICs); and modify values inside each …

被引用次数：47 相关文章所有 9 个版本

[PDF] acm.org

JEDI: These aren't the JSON documents you're looking for...

T Hütter, N Augsten, CM Kirsch, MJ Carey… - Proceedings of the 2022 …, 2022 - dl.acm.org

The JavaScript Object Notation (JSON) is a popular data format used in document stores to
natively support semi-structured data. In this paper, we address the problem of JSON …

被引用次数：13 相关文章所有 8 个版本

Privacy preserving similarity joins using MapReduce

X Ding, W Yang, KKR Choo, X Wang, H Jin - Information Sciences, 2019 - Elsevier

Similarity join is an essential operator in data processing, mining and analysis. However, it is
resource intensive and time consuming, particularly when processing big data. There is also …

被引用次数：27 相关文章所有 3 个版本

[PDF] shangzeyuan.com

Balance-aware distributed string similarity-based query processing system

J Sun, Z Shang, G Li, D Deng, Z Bao - Proceedings of the VLDB …, 2019 - dl.acm.org

Data analysts spend more than 80% of time on data cleaning and integration in the whole
process of data analytics due to data errors and inconsistencies. Similarity-based query …

被引用次数：18 相关文章所有 17 个版本

[PDF] helsinki.fi

Towards a unified framework for string similarity joins

P Xu, J Lu - Proceedings of the VLDB Endowment, 2019 - researchportal.helsinki.fi

A similarity join aims to find all similar pairs between two collections of records. Established
algorithms utilise different similarity measures, either syntactic or semantic, to quantify the …

被引用次数：16 相关文章所有 12 个版本

[HTML] sigmod.org

Cache-oblivious high-performance similarity join

M Perdacher, C Plant, C Böhm - … of the 2019 International Conference on …, 2019 - dl.acm.org

A similarity join combines vectors based on a distance condition. Typically, such algorithms
apply a filter step (by indexing or sorting) and then refine pairs of candidate vectors. In this …

被引用次数：18 相关文章所有 8 个版本