Data fusion

J Bleiholder, F Naumann - ACM computing surveys (CSUR), 2009 - dl.acm.org
The development of the Internet in recent years has made it possible and useful to access
many different information systems anywhere in the world to obtain information. While there …

Semantic text similarity using corpus-based word similarity and string similarity

A Islam, D Inkpen - ACM Transactions on Knowledge Discovery from …, 2008 - dl.acm.org
We present a method for measuring the semantic similarity of texts using a corpus-based
measure of semantic word similarity and a normalized and modified version of the Longest …

Approximate data instance matching: a survey

CF Dorneles, R Gonçalves… - … and Information Systems, 2011 - Springer
Approximate data matching is a central problem in several data management processes,
such as data integration, data cleaning, approximate queries, similarity search and so on. An …

Semantic textual similarity methods, tools, and applications: A survey

G Majumder, P Pakray, A Gelbukh, D Pinto - Computación y Sistemas, 2016 - scielo.org.mx
Abstract Measuring Semantic Textual Similarity (STS), between words/terms, sentences,
paragraph and document plays an important role in computer science and computational …

[图书][B] Conflict handling strategies in an integrated information system

J Bleiholder, F Naumann - 2006 - edoc.hu-berlin.de
Integrated information systems provide users and applications with a unified view of
heterogeneous data sources. To provide a single consistent result for every object …

Hermes: Data Web search on a pay-as-you-go integration infrastructure

T Tran, H Wang, P Haase - Journal of Web Semantics, 2009 - Elsevier
The Web as a global information space is developing from a Web of documents to a Web of
data. This development opens new ways for addressing complex information needs. Search …

Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies.

F Naumann, A Bilke, J Bleiholder, M Weis - IEEE Data Eng. Bull., 2006 - sites.computer.org
Heterogeneous and dirty data is abundant. It is stored under different, often opaque
schemata, it represents identical real-world objects multiple times, causing duplicates, and it …

[HTML][HTML] Anatomy of data integration

O Brazhnik, JF Jones - Journal of biomedical informatics, 2007 - Elsevier
Producing reliable information is the ultimate goal of data processing. The ocean of data
created with the advances of science and technologies calls for integration of data coming …

Similarity queries: their conceptual evaluation, transformations, and processing

YN Silva, WG Aref, PA Larson, SS Pearson, MH Ali - The VLDB Journal, 2013 - Springer
Many application scenarios can significantly benefit from the identification and processing of
similarities in the data. Even though some work has been done to extend the semantics of …

Similarity group-by

YN Silva, WG Aref, MH Ali - 2009 IEEE 25th International …, 2009 - ieeexplore.ieee.org
Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision
support systems. In many application scenarios, it is required to group similar but not …