(Almost) all of entity resolution
O Binette, RC Steorts - Science Advances, 2022 - science.org
Whether the goal is to estimate the number of people that live in a congressional district, to
estimate the number of individuals that have died in an armed conflict, or to disambiguate …
estimate the number of individuals that have died in an armed conflict, or to disambiguate …
Statistical Data Integration for Health Policy Evidence-Building
Health policy evidence-building requires data sources such as health care claims, electronic
health records, probability and nonprobability survey data, epidemiological surveillance …
health records, probability and nonprobability survey data, epidemiological surveillance …
Multifile partitioning for record linkage and duplicate detection
S Aleshin-Guendel, M Sadinle - Journal of the American Statistical …, 2023 - Taylor & Francis
Merging datafiles containing information on overlapping sets of entities is a challenging task
in the absence of unique identifiers, and is further complicated when some entities are …
in the absence of unique identifiers, and is further complicated when some entities are …
Adaptive fuzzy string matching: How to merge datasets with only one (messy) identifying field
AR Kaufman, A Klevs - Political Analysis, 2022 - cambridge.org
A single dataset is rarely sufficient to address a question of substantive interest. Instead,
most applied data analysis combines data from multiple sources. Very rarely do two datasets …
most applied data analysis combines data from multiple sources. Very rarely do two datasets …
A unified framework for de-duplication and population size estimation (with discussion)
A Unified Framework for De-Duplication and Population Size Estimation (with Discussion)
Page 1 Bayesian Analysis (2020) 15, Number 2, pp. 633–682 A Unified Framework for De-Duplication …
Page 1 Bayesian Analysis (2020) 15, Number 2, pp. 633–682 A Unified Framework for De-Duplication …
Convergence Diagnostics for Entity Resolution
S Aleshin-Guendel, RC Steorts - Annual Review of Statistics …, 2024 - annualreviews.org
Entity resolution is the process of merging and removing duplicate records from multiple
data sources, often in the absence of unique identifiers. Bayesian models for entity …
data sources, often in the absence of unique identifiers. Bayesian models for entity …
A Primer on the Data Cleaning Pipeline
RC Steorts - Journal of Survey Statistics and Methodology, 2023 - academic.oup.com
The availability of both structured and unstructured databases, such as electronic health
data, social media data, patent data, and surveys that are often updated in real time, among …
data, social media data, patent data, and surveys that are often updated in real time, among …
d-blink: Distributed end-to-end Bayesian entity resolution
NG Marchant, A Kaplan, DN Elazar… - … of Computational and …, 2021 - Taylor & Francis
Entity resolution (ER; also known as record linkage or de-duplication) is the process of
merging noisy databases, often in the absence of unique identifiers. A major advancement …
merging noisy databases, often in the absence of unique identifiers. A major advancement …
Estimating the performance of entity resolution algorithms: Lessons learned through PatentsView. org
O Binette, SA York, E Hickerson, Y Baek… - The American …, 2023 - Taylor & Francis
This article introduces a novel evaluation methodology for entity resolution algorithms. It is
motivated by PatentsView. org, a public-use patent data exploration platform that …
motivated by PatentsView. org, a public-use patent data exploration platform that …
Improving Probabilistic Record Linkage Using Statistical Prediction Models
A Moretti, N Shlomo - International Statistical Review, 2023 - Wiley Online Library
Record linkage brings together information from records in two or more data sources that are
believed to belong to the same statistical unit based on a common set of matching variables …
believed to belong to the same statistical unit based on a common set of matching variables …