[PDF][PDF] GraphGuard: Enhancing Data Quality in Knowledge Graph Pipelines.

R Dorsch, M Freund, J Fries, A Harth - SemIIM, 2023 - ceur-ws.org
SemIIM, 2023ceur-ws.org
We present GraphGuard, a data validation framework to improve the data quality of
pipelines to populate knowledge graphs. The inputs for these pipelines often come from
different sources, requiring various approaches for validating the data against different
defects. This requirement leads to different formats for validation reports, which reduces
contextual, representational, and accessible quality dimensions of data validation. The
proposed framework consists of QualityContracts and Guardians. QualityContracts …
Abstract
We present GraphGuard, a data validation framework to improve the data quality of pipelines to populate knowledge graphs. The inputs for these pipelines often come from different sources, requiring various approaches for validating the data against different defects. This requirement leads to different formats for validation reports, which reduces contextual, representational, and accessible quality dimensions of data validation. The proposed framework consists of QualityContracts and Guardians. QualityContracts encapsulate the necessary data validation requirements in both human and machine-readable formats. Software agents, called Guardians, use the machine-readable format to execute validation methods. We validate the practicality of our framework on a deployed data processing pipeline at a large European airport over several months of data. A comparative analysis between a basic data processing pipeline and a pipeline using our framework showed improvements in the data quality criteria of believability, interpretability, ease of understanding, consistency of representation, conciseness of representation, and accessibility.
ceur-ws.org
以上显示的是最相近的搜索结果。 查看全部搜索结果