Data quality: The other face of big data

B Saha, D Srivastava - 2014 IEEE 30th international conference …, 2014 - ieeexplore.ieee.org
In our Big Data era, data is being generated, collected and analyzed at an unprecedented
scale, and data-driven decision making is sweeping through all aspects of society. Recent …

Semantic web for the legal domain: the next step

P Casanovas, M Palmirani, S Peroni… - Semantic …, 2016 - content.iospress.com
Ontology-driven systems with reasoning capabilities in the legal field are now better
understood. Legal concepts are not discrete, but make up a dynamic continuum between …

An efficient similarity-based approach for comparing XML documents

A Oliveira, G Tessarolli, G Ghiotto, B Pinto… - Information Systems, 2018 - Elsevier
XML documents are widely used to interchange information among heterogeneous systems,
ranging from office applications to scientific experiments. Independently of the domain, XML …

Language edit distance and maximum likelihood parsing of stochastic grammars: Faster algorithms and connection to fundamental graph problems

B Saha - 2015 IEEE 56th Annual Symposium on Foundations of …, 2015 - ieeexplore.ieee.org
Given a context free language G over alphabet Σ and a string s∈ Σ*, the language edit
distance problem seeks the minimum number of edits (insertions, deletions and …

The Dyck language edit distance problem in near-linear time

B Saha - 2014 IEEE 55th Annual Symposium on Foundations of …, 2014 - ieeexplore.ieee.org
Given a string σ over alphabet Σ and a grammar G defined over the same alphabet, how
many minimum number of repairs (insertions, deletions and substitutions) are required to …

Fast & space-efficient approximations of language edit distance and RNA folding: An amnesic dynamic programming approach

B Saha - 2017 IEEE 58th Annual Symposium on Foundations of …, 2017 - ieeexplore.ieee.org
Dynamic programming is a basic, and one of the most systematic techniques for developing
polynomial time algorithms with overwhelming applications. However, it often suffers from …

On repairing structural problems in semi-structured data

F Korn, B Saha, D Srivastava, S Ying - Proceedings of the VLDB …, 2013 - dl.acm.org
Semi-structured data such as XML are popular for data interchange and storage. However,
many XML documents have improper nesting where open-and close-tags are unmatched …

Big data validation case study

C Xie, J Gao, C Tao - … third international conference on big data …, 2017 - ieeexplore.ieee.org
With the advent of big data, data is being generated, collected, transformed, processed and
analyzed at an unprecedented scale. Since data is created at a fast velocity and with a large …

Learning schemas for unordered XML

R Ciucanu, S Staworko - arXiv preprint arXiv:1307.6348, 2013 - arxiv.org
We consider unordered XML, where the relative order among siblings is ignored, and we
investigate the problem of learning schemas from examples given by the user. We focus on …

Dynamic labeling scheme for XML updates

J Liu, XX Zhang - Knowledge-Based Systems, 2016 - Elsevier
Nowadays several labeling schemes are proposed to facilitate XML query processing, in
which structural relationships among nodes could be quickly determined without accessing …