Survey of post-OCR processing approaches

TTH Nguyen, A Jatowt, M Coustaty… - ACM Computing Surveys …, 2021 - dl.acm.org
Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …

[图书][B] Natural language processing for historical texts

M Piotrowski - 2012 - books.google.com
More and more historical texts are becoming available in digital form. Digitization of paper
documents is motivated by the aim of preserving cultural heritage and making it more …

[PDF][PDF] The construction of a 500-million-word reference corpus of contemporary written Dutch

N Oostdijk, M Reynaert, V Hoste… - Essential speech and …, 2013 - library.oapen.org
Around the turn of the century the Dutch language Union commissioned a survey that aimed
to take stock of the availability of basic language resources for the Dutch language …

Natural language processing for cultural heritage domains

C Sporleder - Language and Linguistics Compass, 2010 - Wiley Online Library
Museums, archives, libraries and other cultural heritage institutes maintain large collections
of artefacts, which are valuable knowledge sources for both experts and interested lay …

Deep statistical analysis of OCR errors for effective post-OCR processing

A Jatowt, M Coustaty, NV Nguyen… - 2019 ACM/IEEE Joint …, 2019 - ieeexplore.ieee.org
Post-OCR is an important processing step that follows optical character recognition (OCR)
and is meant to improve the quality of OCR documents by detecting and correcting residual …

An OCR post-correction approach using deep learning for processing medical reports

S Karthikeyan, AGS de Herrera… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
According to a recent Deloitte study, the COVID-19 pandemic continues to place a huge
strain on the global health care sector. Covid-19 has also catalysed digital transformation …

Multi-modular domain-tailored OCR post-correction

S Schulz, J Kuhn - Proceedings of the 2017 Conference on …, 2017 - aclanthology.org
One of the main obstacles for many Digital Humanities projects is the low data availability.
Texts have to be digitized in an expensive and time consuming process whereas Optical …

A two-step approach for automatic OCR post-correction

R Schaefer, C Neudecker - Proceedings of the 4th Joint SIGHUM …, 2020 - aclanthology.org
Abstract The quality of Optical Character Recognition (OCR) is a key factor in the digitisation
of historical documents. OCR errors are a major obstacle for downstream tasks and have …

Customised OCR correction for historical medical text

P Thompson, J McNaught, S Ananiadou - 2015 digital heritage, 2015 - ieeexplore.ieee.org
Historical text archives constitute a rich and diverse source of information, which is
becoming increasingly readily accessible, owing to large-scale digitisation efforts …

Improving OCR accuracy for classical critical editions

F Boschetti, M Romanello, A Babeu, D Bamman… - Research and Advanced …, 2009 - Springer
This paper describes a work-flow designed to populate a digital library of ancient Greek
critical editions with highly accurate OCR scanned text. While the most recently available …