Survey of post-OCR processing approaches

TTH Nguyen, A Jatowt, M Coustaty… - ACM Computing Surveys …, 2021 - dl.acm.org
Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …

An OCR post-correction approach using deep learning for processing medical reports

S Karthikeyan, AGS de Herrera… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
According to a recent Deloitte study, the COVID-19 pandemic continues to place a huge
strain on the global health care sector. Covid-19 has also catalysed digital transformation …

Using automated methods to detect safety problems with health information technology: a scoping review

D Surian, Y Wang, E Coiera… - Journal of the American …, 2023 - academic.oup.com
Objective To summarize the research literature evaluating automated methods for early
detection of safety problems with health information technology (HIT). Materials and …

Automated misspelling detection and correction in Persian clinical text

A Yazdani, M Ghazisaeedi, N Ahmadinejad… - Journal of digital …, 2020 - Springer
Accurate electronic health records are important for clinical care, research, and patient
safety assurance. Correction of misspelled words is required to ensure the correct …

[PDF][PDF] A hybrid solution for extracting information from unstructured data using optical character recognition (OCR) with natural language processing (NLP)

B Dash - Research Gate, 2021 - researchgate.net
With rapid digitalization, organizations are producing a lot of data as part of their day-to-day
operations. These data are stored either on their legacy platforms or in any cloud storage …

Correcting arabic soft spelling mistakes using bilstm-based machine learning

GA Abandah, A Suyyagh, MZ Khedher - arXiv preprint arXiv:2108.01141, 2021 - arxiv.org
Soft spelling errors are a class of spelling mistakes that is widespread among native Arabic
speakers and foreign learners alike. Some of these errors are typographical in nature. They …

Named entity recognition for Chinese biomedical patents

Y Hu, S Verberne - … of the 28th international conference on …, 2020 - aclanthology.org
There is a large body of work on Biomedical Entity Recognition (Bio-NER) for English but
there have only been a few attempts addressing NER for Chinese biomedical texts. Because …

Generating a training corpus for OCR post-correction using encoder-decoder model

E D'hondt, C Grouin, B Grau - Proceedings of the Eighth …, 2017 - aclanthology.org
In this paper we present a novel approach to the automatic correction of OCR-induced
orthographic errors in a given text. While current systems depend heavily on large training …

Upcycle your OCR: Reusing OCRs for post-OCR text correction in Romanised Sanskrit

A Krishna, BP Majumder, RS Bhat, P Goyal - arXiv preprint arXiv …, 2018 - arxiv.org
We propose a post-OCR text correction approach for digitising texts in Romanised Sanskrit.
Owing to the lack of resources our approach uses OCR models trained for other languages …

Improving the quality of Persian clinical text with a novel spelling correction system

SMS Dashti, SF Dashti - BMC Medical Informatics and Decision Making, 2024 - Springer
Background The accuracy of spelling in Electronic Health Records (EHRs) is a critical factor
for efficient clinical care, research, and ensuring patient safety. The Persian language, with …