Vchunkjoin: An efficient algorithm for edit similarity joins

G Papadakis, D Skoutas, E Thanos… - ACM Computing Surveys …, 2020 - dl.acm.org

Entity Resolution (ER), a core task of Data Integration, detects different entity profiles that
correspond to the same real-world object. Due to its inherently quadratic complexity, a series …

被引用次数：217 相关文章所有 7 个版本

[PDF] hep.com.cn

String similarity search and join: a survey

M Yu, G Li, D Deng, J Feng - Frontiers of Computer Science, 2016 - Springer

String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …

被引用次数：177 相关文章所有 17 个版本

[PDF] vldb.org

String similarity joins: An experimental evaluation

Y Jiang, G Li, J Feng, WS Li - Proceedings of the VLDB Endowment, 2014 - dl.acm.org

String similarity join is an important operation in data integration and cleansing that finds
similar string pairs from two collections of strings. More than ten algorithms have been …

被引用次数：206 相关文章所有 11 个版本

[PDF] tsinghua.edu.cn

Massjoin: A mapreduce-based method for scalable string similarity joins

D Deng, G Li, S Hao, J Wang… - 2014 IEEE 30th …, 2014 - ieeexplore.ieee.org

String similarity join is an essential operation in data integration. The era of big data calls for
scalable algorithms to support large-scale string similarity joins. In this paper, we study …

被引用次数：163 相关文章所有 18 个版本

[PDF] acm.org

Embedjoin: Efficient edit similarity joins via embeddings

H Zhang, Q Zhang - Proceedings of the 23rd ACM SIGKDD international …, 2017 - dl.acm.org

We study the problem of edit similarity joins, where given a set of strings and a threshold
value K, we want to output all pairs of strings whose edit distances are at most K. Edit …

被引用次数：59 相关文章所有 5 个版本

[PDF] tsinghua.edu.cn

A pivotal prefix based filtering algorithm for string similarity search

D Deng, G Li, J Feng - Proceedings of the 2014 ACM SIGMOD …, 2014 - dl.acm.org

We study the string similarity search problem with edit-distance constraints, which, given a
set of data strings and a query string, finds the similar strings to the query. Existing …

被引用次数：76 相关文章所有 14 个版本

Efficient processing of graph similarity queries with edit distance constraints

X Zhao, C Xiao, X Lin, W Wang, Y Ishikawa - The VLDB Journal, 2013 - Springer

Graphs are widely used to model complicated data semantics in many applications in
bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to …

被引用次数：75 相关文章所有 7 个版本

[PDF] tsinghua.edu.cn

Efficient similarity join and search on multi-attribute data

G Li, J He, D Deng, J Li - Proceedings of the 2015 ACM SIGMOD …, 2015 - dl.acm.org

In this paper we study similarity join and search on multi-attribute data. Traditional methods
on single-attribute data have pruning power only on single attributes and cannot efficiently …

被引用次数：53 相关文章所有 12 个版本

[PDF] arxiv.org

A survey of blocking and filtering techniques for entity resolution

G Papadakis, D Skoutas, E Thanos… - arXiv preprint arXiv …, 2019 - arxiv.org

Efficiency techniques are an integral part of Entity Resolution, since its infancy. In this
survey, we organized the bulk of works in the field into Blocking, Filtering and hybrid …

被引用次数：30 相关文章所有 5 个版本

[PDF] nsf.gov

Similarity query support in big data management systems

T Kim, W Li, A Behm, I Cetindil, R Vernica, V Borkar… - Information Systems, 2020 - Elsevier

Similarity query processing is becoming increasingly important in many applications such as
data cleaning, record linkage, Web search, and document analytics. In this paper we study …

被引用次数：25 相关文章所有 4 个版本