Survey of post-OCR processing approaches
Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …
converting printed documents into machine-readable ones. While OCR engines can do well …
[图书][B] Natural language processing for historical texts
M Piotrowski - 2012 - books.google.com
More and more historical texts are becoming available in digital form. Digitization of paper
documents is motivated by the aim of preserving cultural heritage and making it more …
documents is motivated by the aim of preserving cultural heritage and making it more …
[PDF][PDF] The construction of a 500-million-word reference corpus of contemporary written Dutch
Around the turn of the century the Dutch language Union commissioned a survey that aimed
to take stock of the availability of basic language resources for the Dutch language …
to take stock of the availability of basic language resources for the Dutch language …
Natural language processing for cultural heritage domains
C Sporleder - Language and Linguistics Compass, 2010 - Wiley Online Library
Museums, archives, libraries and other cultural heritage institutes maintain large collections
of artefacts, which are valuable knowledge sources for both experts and interested lay …
of artefacts, which are valuable knowledge sources for both experts and interested lay …
Deep statistical analysis of OCR errors for effective post-OCR processing
Post-OCR is an important processing step that follows optical character recognition (OCR)
and is meant to improve the quality of OCR documents by detecting and correcting residual …
and is meant to improve the quality of OCR documents by detecting and correcting residual …
An OCR post-correction approach using deep learning for processing medical reports
S Karthikeyan, AGS de Herrera… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
According to a recent Deloitte study, the COVID-19 pandemic continues to place a huge
strain on the global health care sector. Covid-19 has also catalysed digital transformation …
strain on the global health care sector. Covid-19 has also catalysed digital transformation …
Multi-modular domain-tailored OCR post-correction
S Schulz, J Kuhn - Proceedings of the 2017 Conference on …, 2017 - aclanthology.org
One of the main obstacles for many Digital Humanities projects is the low data availability.
Texts have to be digitized in an expensive and time consuming process whereas Optical …
Texts have to be digitized in an expensive and time consuming process whereas Optical …
A two-step approach for automatic OCR post-correction
R Schaefer, C Neudecker - Proceedings of the 4th Joint SIGHUM …, 2020 - aclanthology.org
Abstract The quality of Optical Character Recognition (OCR) is a key factor in the digitisation
of historical documents. OCR errors are a major obstacle for downstream tasks and have …
of historical documents. OCR errors are a major obstacle for downstream tasks and have …
Customised OCR correction for historical medical text
Historical text archives constitute a rich and diverse source of information, which is
becoming increasingly readily accessible, owing to large-scale digitisation efforts …
becoming increasingly readily accessible, owing to large-scale digitisation efforts …
Improving OCR accuracy for classical critical editions
This paper describes a work-flow designed to populate a digital library of ancient Greek
critical editions with highly accurate OCR scanned text. While the most recently available …
critical editions with highly accurate OCR scanned text. While the most recently available …