Efficient similarity-based operations for data integration

J Bleiholder, F Naumann - ACM computing surveys (CSUR), 2009 - dl.acm.org

The development of the Internet in recent years has made it possible and useful to access
many different information systems anywhere in the world to obtain information. While there …

被引用次数：893 相关文章所有 9 个版本

[PDF] academia.edu

Semantic text similarity using corpus-based word similarity and string similarity

A Islam, D Inkpen - ACM Transactions on Knowledge Discovery from …, 2008 - dl.acm.org

We present a method for measuring the semantic similarity of texts using a corpus-based
measure of semantic word similarity and a normalized and modified version of the Longest …

被引用次数：743 相关文章所有 11 个版本

Approximate data instance matching: a survey

CF Dorneles, R Gonçalves… - … and Information Systems, 2011 - Springer

Approximate data matching is a central problem in several data management processes,
such as data integration, data cleaning, approximate queries, similarity search and so on. An …

被引用次数：80 相关文章所有 12 个版本

[PDF] scielo.org.mx

Semantic textual similarity methods, tools, and applications: A survey

G Majumder, P Pakray, A Gelbukh, D Pinto - Computación y Sistemas, 2016 - scielo.org.mx

Abstract Measuring Semantic Textual Similarity (STS), between words/terms, sentences,
paragraph and document plays an important role in computer science and computational …

被引用次数：124 相关文章所有 12 个版本

[PDF] hu-berlin.de

[图书][B] Conflict handling strategies in an integrated information system

J Bleiholder, F Naumann - 2006 - edoc.hu-berlin.de

Integrated information systems provide users and applications with a unified view of
heterogeneous data sources. To provide a single consistent result for every object …

被引用次数：119 相关文章所有 12 个版本

[PDF] academia.edu

Hermes: Data Web search on a pay-as-you-go integration infrastructure

T Tran, H Wang, P Haase - Journal of Web Semantics, 2009 - Elsevier

The Web as a global information space is developing from a Web of documents to a Web of
data. This development opens new ways for addressing complex information needs. Search …

被引用次数：105 相关文章所有 11 个版本

Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies.

F Naumann, A Bilke, J Bleiholder, M Weis - IEEE Data Eng. Bull., 2006 - sites.computer.org

Heterogeneous and dirty data is abundant. It is stored under different, often opaque
schemata, it represents identical real-world objects multiple times, causing duplicates, and it …

被引用次数：99 相关文章

[HTML] sciencedirect.com

[HTML][HTML] Anatomy of data integration

O Brazhnik, JF Jones - Journal of biomedical informatics, 2007 - Elsevier

Producing reliable information is the ultimate goal of data processing. The ocean of data
created with the advances of science and technologies calls for integration of data coming …

被引用次数：89 相关文章所有 7 个版本

[PDF] researchgate.net

Similarity queries: their conceptual evaluation, transformations, and processing

YN Silva, WG Aref, PA Larson, SS Pearson, MH Ali - The VLDB Journal, 2013 - Springer

Many application scenarios can significantly benefit from the identification and processing of
similarities in the data. Even though some work has been done to extend the semantics of …

被引用次数：55 相关文章所有 21 个版本

[PDF] washington.edu

Similarity group-by

YN Silva, WG Aref, MH Ali - 2009 IEEE 25th International …, 2009 - ieeexplore.ieee.org

Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision
support systems. In many application scenarios, it is required to group similar but not …

被引用次数：70 相关文章所有 13 个版本