(Almost) all of entity resolution

O Binette, RC Steorts - Science Advances, 2022 - science.org
Whether the goal is to estimate the number of people that live in a congressional district, to
estimate the number of individuals that have died in an armed conflict, or to disambiguate …

Statistical Data Integration for Health Policy Evidence-Building

SM Paddock, C Franco, FJ Breidt… - Annual Review of …, 2024 - annualreviews.org
Health policy evidence-building requires data sources such as health care claims, electronic
health records, probability and nonprobability survey data, epidemiological surveillance …

Multifile partitioning for record linkage and duplicate detection

S Aleshin-Guendel, M Sadinle - Journal of the American Statistical …, 2023 - Taylor & Francis
Merging datafiles containing information on overlapping sets of entities is a challenging task
in the absence of unique identifiers, and is further complicated when some entities are …

Adaptive fuzzy string matching: How to merge datasets with only one (messy) identifying field

AR Kaufman, A Klevs - Political Analysis, 2022 - cambridge.org
A single dataset is rarely sufficient to address a question of substantive interest. Instead,
most applied data analysis combines data from multiple sources. Very rarely do two datasets …

A unified framework for de-duplication and population size estimation (with discussion)

A Tancredi, R Steorts, B Liseo - 2020 - projecteuclid.org
A Unified Framework for De-Duplication and Population Size Estimation (with Discussion)
Page 1 Bayesian Analysis (2020) 15, Number 2, pp. 633–682 A Unified Framework for De-Duplication …

Convergence Diagnostics for Entity Resolution

S Aleshin-Guendel, RC Steorts - Annual Review of Statistics …, 2024 - annualreviews.org
Entity resolution is the process of merging and removing duplicate records from multiple
data sources, often in the absence of unique identifiers. Bayesian models for entity …

A Primer on the Data Cleaning Pipeline

RC Steorts - Journal of Survey Statistics and Methodology, 2023 - academic.oup.com
The availability of both structured and unstructured databases, such as electronic health
data, social media data, patent data, and surveys that are often updated in real time, among …

d-blink: Distributed end-to-end Bayesian entity resolution

NG Marchant, A Kaplan, DN Elazar… - … of Computational and …, 2021 - Taylor & Francis
Entity resolution (ER; also known as record linkage or de-duplication) is the process of
merging noisy databases, often in the absence of unique identifiers. A major advancement …

Estimating the performance of entity resolution algorithms: Lessons learned through PatentsView. org

O Binette, SA York, E Hickerson, Y Baek… - The American …, 2023 - Taylor & Francis
This article introduces a novel evaluation methodology for entity resolution algorithms. It is
motivated by PatentsView. org, a public-use patent data exploration platform that …

Improving Probabilistic Record Linkage Using Statistical Prediction Models

A Moretti, N Shlomo - International Statistical Review, 2023 - Wiley Online Library
Record linkage brings together information from records in two or more data sources that are
believed to belong to the same statistical unit based on a common set of matching variables …