String similarity search and join: a survey

M Yu, G Li, D Deng, J Feng - Frontiers of Computer Science, 2016 - Springer
String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …

[PDF][PDF] 大数据的-个重要方面数据可用性

李建中, 刘显敏 - 计算机研究与发展, 2013 - cs.sjtu.edu.cn
摘要!"# $% &'()*+,-.# $/0 123 4567893:;% &'<=>?@ ABCDEF GFHI# $8 J'KLMN
OPQRSTU@'VWIABXYZ [\],@ AB'KLVW^ _I!" AB'aZbc deABQ!^ fS ABXYZghiKjk l# $8 J …

Pass-join: A partition-based method for similarity joins

G Li, D Deng, J Wang, J Feng - arXiv preprint arXiv:1111.7171, 2011 - arxiv.org
As an essential operation in data cleaning, the similarity join has attracted considerable
attention from the database community. In this paper, we study string similarity joins with edit …

Approximate data instance matching: a survey

CF Dorneles, R Gonçalves… - … and Information Systems, 2011 - Springer
Approximate data matching is a central problem in several data management processes,
such as data integration, data cleaning, approximate queries, similarity search and so on. An …

Fast-join: An efficient method for fuzzy token matching based string similarity join

J Wang, G Li, J Fe - 2011 IEEE 27th International Conference …, 2011 - ieeexplore.ieee.org
String similarity join that finds similar string pairs between two string sets is an essential
operation in many applications, and has attracted significant attention recently in the …

A semi-NMF-PCA unified framework for data clustering

K Allab, L Labiod, M Nadif - IEEE Transactions on Knowledge …, 2016 - ieeexplore.ieee.org
In this work, we propose a novel way to consider the clustering and the reduction of the
dimension simultaneously. Indeed, our approach takes advantage of the mutual …

Understand short texts by harvesting and analyzing semantic knowledge

W Hua, Z Wang, H Wang, K Zheng… - IEEE transactions on …, 2016 - ieeexplore.ieee.org
Understanding short texts is crucial to many applications, but challenges abound. First, short
texts do not always observe the syntax of a written language. As a result, traditional natural …

Efficient exact edit similarity query processing with the asymmetric signature scheme

J Qin, W Wang, Y Lu, C Xiao, X Lin - Proceedings of the 2011 ACM …, 2011 - dl.acm.org
Given a query string Q, an edit similarity search finds all strings in a database whose edit
distance with Q is no more than a given threshold t. Most existing method answering edit …

Efficient approximate entity matching using jaro-winkler distance

Y Wang, J Qin, W Wang - International conference on web information …, 2017 - Springer
Jaro-Winkler distance is a measurement to measure the similarity between two strings.
Since Jaro-Winkler distance performs well in matching personal and entity names, it is …

[PDF][PDF] Simple and efficient algorithm for approximate dictionary matching

N Okazaki, J Tsujii - … of the 23rd International Conference on …, 2010 - aclanthology.org
This paper presents a simple and efficient algorithm for approximate dictionary matching
designed for similarity measures such as cosine, Dice, Jaccard, and overlap coefficients. We …