Blocking and filtering techniques for entity resolution: A survey
Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …
String similarity search and join: a survey
String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …
integration, which extend traditional exact search and exact join operations in databases by …
String similarity joins: An experimental evaluation
String similarity join is an important operation in data integration and cleansing that finds
similar string pairs from two collections of strings. More than ten algorithms have been …
similar string pairs from two collections of strings. More than ten algorithms have been …
Massjoin: A mapreduce-based method for scalable string similarity joins
String similarity join is an essential operation in data integration. The era of big data calls for
scalable algorithms to support large-scale string similarity joins. In this paper, we study …
scalable algorithms to support large-scale string similarity joins. In this paper, we study …
Embedjoin: Efficient edit similarity joins via embeddings
We study the problem of edit similarity joins, where given a set of strings and a threshold
value K, we want to output all pairs of strings whose edit distances are at most K. Edit …
value K, we want to output all pairs of strings whose edit distances are at most K. Edit …
A pivotal prefix based filtering algorithm for string similarity search
We study the string similarity search problem with edit-distance constraints, which, given a
set of data strings and a query string, finds the similar strings to the query. Existing …
set of data strings and a query string, finds the similar strings to the query. Existing …
Efficient processing of graph similarity queries with edit distance constraints
Graphs are widely used to model complicated data semantics in many applications in
bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to …
bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to …
Efficient similarity join and search on multi-attribute data
In this paper we study similarity join and search on multi-attribute data. Traditional methods
on single-attribute data have pruning power only on single attributes and cannot efficiently …
on single-attribute data have pruning power only on single attributes and cannot efficiently …
A survey of blocking and filtering techniques for entity resolution
Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this
survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid …
survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid …
Similarity query support in big data management systems
T Kim, W Li, A Behm, I Cetindil, R Vernica, V Borkar… - Information Systems, 2020 - Elsevier
Similarity query processing is becoming increasingly important in many applications such as
data cleaning, record linkage, Web search, and document analytics. In this paper we study …
data cleaning, record linkage, Web search, and document analytics. In this paper we study …