Deep entity matching: Challenges and opportunities

Y Li, J Li, Y Suhara, J Wang, W Hirota… - Journal of Data and …, 2021 - dl.acm.org
Entity matching refers to the task of determining whether two different representations refer
to the same real-world entity. It continues to be a prevalent problem for many organizations …

Deep learning for blocking in entity matching: a design space exploration

S Thirumuruganathan, H Li, N Tang… - Proceedings of the …, 2021 - dl.acm.org
Entity matching (EM) finds data instances that refer to the same real-world entity. Most EM
solutions perform blocking then matching. Many works have applied deep learning (DL) to …

Big graphs: challenges and opportunities

W Fan - Proceedings of the VLDB Endowment, 2022 - dl.acm.org
Big data is typically characterized with 4V's: Volume, Velocity, Variety and Veracity. When it
comes to big graphs, these challenges become even more staggering. Each and every of …

[图书][B] The four generations of entity resolution

Entity Resolution (ER) lies at the core of data integration and cleaning and, thus, a bulk of
the research examines ways for improving its effectiveness and time efficiency. The initial …

RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation

N Tang, J Fan, F Li, J Tu, X Du, G Li, S Madden… - arXiv preprint arXiv …, 2020 - arxiv.org
Can AI help automate human-easy but computer-hard data preparation tasks that burden
data scientists, practitioners, and crowd workers? We answer this question by presenting …

Pre-trained embeddings for entity resolution: an experimental analysis

A Zeakis, G Papadakis, D Skoutas… - Proceedings of the VLDB …, 2023 - dl.acm.org
Many recent works on Entity Resolution (ER) leverage Deep Learning techniques involving
language models to improve effectiveness. This is applied to both main steps of ER, ie …

Machine learning and data cleaning: Which serves the other?

IF Ilyas, T Rekatsinas - ACM Journal of Data and Information Quality …, 2022 - dl.acm.org
The last few years witnessed significant advances in building automated or semi-automated
data quality, data cleaning and data integration systems powered by machine learning (ML) …

Cleanml: A study for evaluating the impact of data cleaning on ml classification tasks

P Li, X Rao, J Blase, Y Zhang, X Chu… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org
Data quality affects machine learning (ML) model performances, and data scientists spend
considerable amount of time on data cleaning before model training. However, to date, there …

VolcanoML: speeding up end-to-end AutoML via scalable search space decomposition

Y Li, Y Shen, W Zhang, C Zhang, B Cui - The VLDB Journal, 2023 - Springer
End-to-end AutoML has attracted intensive interests from both academia and industry which
automatically searches for ML pipelines in a space induced by feature engineering …

Entity resolution on-demand

G Simonini, L Zecchini, S Bergamaschi… - Proceedings of the …, 2022 - iris.unimore.it
Entity Resolution (ER) aims to identify and merge records that refer to the same real-world
entity. ER is typically employed as an expensive cleaning step on the entire data before …