[PDF][PDF] Duplicate detection for quality assurance of document image collections

R Huber-Mörk, A Schindler, S Schlarb - Preservation of Digital …, 2012 - academia.edu
R Huber-Mörk, A Schindler, S Schlarb
Preservation of Digital Objects, 2012academia.edu
Digital preservation workflows for image collections involving automatic and semi-automatic
image acquisition and processing are prone to reduced quality. We present a method for
quality assurance of scanned content based on computer vision. A visual dictionary derived
from local image descriptors enables efficient perceptual image fingerprinting in order to
compare scanned book pages and detect duplicated pages. A spatial verification step
involving descriptor matching provides further robustness of the approach. Results for a …
Abstract
Digital preservation workflows for image collections involving automatic and semi-automatic image acquisition and processing are prone to reduced quality. We present a method for quality assurance of scanned content based on computer vision. A visual dictionary derived from local image descriptors enables efficient perceptual image fingerprinting in order to compare scanned book pages and detect duplicated pages. A spatial verification step involving descriptor matching provides further robustness of the approach. Results for a digitized book collection of approximately 35.000 pages are presented. Duplicated pages are identified with high reliability and well in accordance with results obtained independently by human visual inspection.
academia.edu
以上显示的是最相近的搜索结果。 查看全部搜索结果