Entity resolution with hierarchical graph attention networks

D Yao, Y Gu, G Cong, H Jin, X Lv - Proceedings of the 2022 International …, 2022 - dl.acm.org
Entity Resolution (ER) links entities that refer to the same real-world entity from different
sources. Existing work usually takes pairs of entities as input and judges those pairs …

Effective entity matching with transformers

Y Li, J Li, Y Suhara, AH Doan, WC Tan - The VLDB Journal, 2023 - Springer
We present Ditto, a novel entity matching system based on pre-trained Transformer
language models. We fine-tune and cast EM as a sequence-pair classification problem to …

Let's speak trajectories

M Musleh, MF Mokbel, S Abbar - … of the 30th International Conference on …, 2022 - dl.acm.org
Trajectory-based applications have acquired significant attention over the past decade with
the rising size of trajectory data generated by users. However, building trajectory-based …

FlexER: flexible entity resolution for multiple intents

B Genossar, R Shraga, A Gal - Proceedings of the ACM on Management …, 2023 - dl.acm.org
Entity resolution, a longstanding problem of data cleaning and integration, aims at
identifying data records that represent the same real-world entity. Existing approaches treat …

CollaborEM: A self-supervised entity matching framework using multi-features collaboration

C Ge, P Wang, L Chen, X Liu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Entity Matching (EM) aims to identify whether two tuples refer to the same real-world entity
and is well-known to be labor-intensive. It is a prerequisite to anomaly detection, as …

Parallel rule discovery from large datasets by sampling

W Fan, Z Han, Y Wang, M Xie - … of the 2022 international conference on …, 2022 - dl.acm.org
Rule discovery from large datasets is often prohibitively costly. The problem becomes more
staggering when the rules are collectively defined across multiple tables. To scale with large …

Ember: No-Code Context Enrichment via similarity-based keyless joins

S Suri, IF Ilyas, C Ré, T Rekatsinas - arXiv preprint arXiv:2106.01501, 2021 - arxiv.org
Structured data, or data that adheres to a pre-defined schema, can suffer from fragmented
context: information describing a single entity can be scattered across multiple datasets or …

Geospatial entity resolution

P Balsebre, D Yao, G Cong, Z Hai - … of the ACM Web Conference 2022, 2022 - dl.acm.org
A geospatial database is today at the core of an ever increasing number of services.
Building and maintaining it remains challenging due to the need to merge information from …

Discovering Top-k Rules using Subjective and Objective Criteria

W Fan, Z Han, Y Wang, M Xie - Proceedings of the ACM on Management …, 2023 - dl.acm.org
This paper studies two questions about rule discovery. Can we characterize the usefulness
of rules using quantitative criteria? How can we discover rules using those criteria? As a …

Sent2Span: span detection for PICO extraction in the biomedical text without span annotations

S Liu, Y Sun, B Li, W Wang, FT Bourgeois… - arXiv preprint arXiv …, 2021 - arxiv.org
The rapid growth in published clinical trials makes it difficult to maintain up-to-date
systematic reviews, which requires finding all relevant trials. This leads to policy and practice …