String similarity search and join: a survey
String similarity search and join are two important operations in data cleaning and
integration, which extend traditional exact search and exact join operations in databases by …
integration, which extend traditional exact search and exact join operations in databases by …
[PDF][PDF] 大数据的-个重要方面数据可用性
李建中, 刘显敏 - 计算机研究与发展, 2013 - cs.sjtu.edu.cn
摘要!"# $% &'()*+,-.# $/0 123 4567893:;% &'<=>?@ ABCDEF GFHI# $8 J'KLMN
OPQRSTU@'VWIABXYZ [\],@ AB'KLVW^ _I!" AB'aZbc deABQ!^ fS ABXYZghiKjk l# $8 J …
OPQRSTU@'VWIABXYZ [\],@ AB'KLVW^ _I!" AB'aZbc deABQ!^ fS ABXYZghiKjk l# $8 J …
Pass-join: A partition-based method for similarity joins
As an essential operation in data cleaning, the similarity join has attracted considerable
attention from the database community. In this paper, we study string similarity joins with edit …
attention from the database community. In this paper, we study string similarity joins with edit …
Approximate data instance matching: a survey
CF Dorneles, R Gonçalves… - … and Information Systems, 2011 - Springer
Approximate data matching is a central problem in several data management processes,
such as data integration, data cleaning, approximate queries, similarity search and so on. An …
such as data integration, data cleaning, approximate queries, similarity search and so on. An …
Fast-join: An efficient method for fuzzy token matching based string similarity join
String similarity join that finds similar string pairs between two string sets is an essential
operation in many applications, and has attracted significant attention recently in the …
operation in many applications, and has attracted significant attention recently in the …
A semi-NMF-PCA unified framework for data clustering
In this work, we propose a novel way to consider the clustering and the reduction of the
dimension simultaneously. Indeed, our approach takes advantage of the mutual …
dimension simultaneously. Indeed, our approach takes advantage of the mutual …
Understand short texts by harvesting and analyzing semantic knowledge
Understanding short texts is crucial to many applications, but challenges abound. First, short
texts do not always observe the syntax of a written language. As a result, traditional natural …
texts do not always observe the syntax of a written language. As a result, traditional natural …
Efficient exact edit similarity query processing with the asymmetric signature scheme
Given a query string Q, an edit similarity search finds all strings in a database whose edit
distance with Q is no more than a given threshold t. Most existing method answering edit …
distance with Q is no more than a given threshold t. Most existing method answering edit …
Efficient approximate entity matching using jaro-winkler distance
Jaro-Winkler distance is a measurement to measure the similarity between two strings.
Since Jaro-Winkler distance performs well in matching personal and entity names, it is …
Since Jaro-Winkler distance performs well in matching personal and entity names, it is …
[PDF][PDF] Simple and efficient algorithm for approximate dictionary matching
N Okazaki, J Tsujii - … of the 23rd International Conference on …, 2010 - aclanthology.org
This paper presents a simple and efficient algorithm for approximate dictionary matching
designed for similarity measures such as cosine, Dice, Jaccard, and overlap coefficients. We …
designed for similarity measures such as cosine, Dice, Jaccard, and overlap coefficients. We …