Generalized supervised meta-blocking

L Gagliardelli, G Papadakis, G Simonini… - Proceedings of the …, 2022 - iris.unimore.it
Entity Resolution is a core data integration task that relies on Blocking to scale to large
datasets. Schema-agnostic blocking achieves very high recall, requires no domain …

Towards building live open scientific knowledge graphs

A Le-Tuan, C Franzreb, D Le Phuoc… - … Proceedings of the …, 2022 - dl.acm.org
Due to the large number and heterogeneity of data sources, it becomes increasingly difficult
to follow the research output and the scientific discourse. For example, a publication listed …

Towards Scalable Generation of Realistic Test Data for Duplicate Detection

F Panse, W Wingerath, B Wollmer - arXiv preprint arXiv:2312.17324, 2023 - arxiv.org
Due to the increasing volume, volatility, and diversity of data in virtually all areas of our lives,
the ability to detect duplicates in potentially linked data sources is more important than ever …

Leveraging Machine Learning for Effective Data Management

S Sellami - Transactions on Large-Scale Data-and Knowledge …, 2024 - Springer
The exponential growth of heterogeneous data from diverse sources, such as social media,
IoT sensors, and transactional databases, poses significant challenges for effective …

[PDF][PDF] Similarity-driven Schema Transformation for Test Data Generation.

F Panse, M Klettke, J Schildgen… - …, 2022 - vsis-www.informatik.uni-hamburg.de
ABSTRACT A flexible and versed generation of test data is an important aspect in
benchmarking algorithms for data integration. This includes the generation of …

[引用][C] TermMatcher: Learning to Associate Business Terms with Table Schemas

T Cai, S Sheen, P Gaurav, S Bathini, AH Doan