Entity resolution with hierarchical graph attention networks
Entity Resolution (ER) links entities that refer to the same real-world entity from different
sources. Existing work usually takes pairs of entities as input and judges those pairs …
sources. Existing work usually takes pairs of entities as input and judges those pairs …
Effective entity matching with transformers
We present Ditto, a novel entity matching system based on pre-trained Transformer
language models. We fine-tune and cast EM as a sequence-pair classification problem to …
language models. We fine-tune and cast EM as a sequence-pair classification problem to …
Let's speak trajectories
Trajectory-based applications have acquired significant attention over the past decade with
the rising size of trajectory data generated by users. However, building trajectory-based …
the rising size of trajectory data generated by users. However, building trajectory-based …
FlexER: flexible entity resolution for multiple intents
Entity resolution, a longstanding problem of data cleaning and integration, aims at
identifying data records that represent the same real-world entity. Existing approaches treat …
identifying data records that represent the same real-world entity. Existing approaches treat …
CollaborEM: A self-supervised entity matching framework using multi-features collaboration
Entity Matching (EM) aims to identify whether two tuples refer to the same real-world entity
and is well-known to be labor-intensive. It is a prerequisite to anomaly detection, as …
and is well-known to be labor-intensive. It is a prerequisite to anomaly detection, as …
Parallel rule discovery from large datasets by sampling
Rule discovery from large datasets is often prohibitively costly. The problem becomes more
staggering when the rules are collectively defined across multiple tables. To scale with large …
staggering when the rules are collectively defined across multiple tables. To scale with large …
Ember: No-Code Context Enrichment via similarity-based keyless joins
Structured data, or data that adheres to a pre-defined schema, can suffer from fragmented
context: information describing a single entity can be scattered across multiple datasets or …
context: information describing a single entity can be scattered across multiple datasets or …
Geospatial entity resolution
A geospatial database is today at the core of an ever increasing number of services.
Building and maintaining it remains challenging due to the need to merge information from …
Building and maintaining it remains challenging due to the need to merge information from …
Discovering Top-k Rules using Subjective and Objective Criteria
This paper studies two questions about rule discovery. Can we characterize the usefulness
of rules using quantitative criteria? How can we discover rules using those criteria? As a …
of rules using quantitative criteria? How can we discover rules using those criteria? As a …
Sent2Span: span detection for PICO extraction in the biomedical text without span annotations
The rapid growth in published clinical trials makes it difficult to maintain up-to-date
systematic reviews, which requires finding all relevant trials. This leads to policy and practice …
systematic reviews, which requires finding all relevant trials. This leads to policy and practice …