The newspaper navigator dataset: extracting and analyzing visual content from 16 million historic newspaper pages in chronicling America

BCG Lee, J Mears, E Jakeway, M Ferriter… - arXiv preprint arXiv …, 2020 - arxiv.org
Chronicling America is a product of the National Digital Newspaper Program, a partnership
between the Library of Congress and the National Endowment for the Humanities to digitize …

An Efficient Unsupervised Approach for OCR Error Correction of Vietnamese OCR Text

QD Nguyen, NM Phan, P Krömer, DA Le - IEEE Access, 2023 - ieeexplore.ieee.org
Different types of OCR errors often occur in OCR texts due to the low quality of scanned
document images or limitations in OCR software. In this paper, we propose a novel …

OCR error correction using correction patterns and self-organizing migrating algorithm

QD Nguyen, DA Le, NM Phan, I Zelinka - Pattern Analysis and …, 2021 - Springer
Optical character recognition (OCR) systems help to digitize paper-based historical
achieves. However, poor quality of scanned documents and limitations of text recognition …

Vsec: Transformer-based model for vietnamese spelling correction

DT Do, HT Nguyen, TN Bui, HD Vo - … 8–12, 2021, Proceedings, Part II 18, 2021 - Springer
Spelling error correction is one of topics which have a long history in natural language
processing. Although previous studies have achieved remarkable results, challenges still …

Toward a period-specific optimized neural network for OCR error correction of historical Hebrew texts

O Suissa, M Zhitomirsky-Geffet… - ACM Journal on …, 2022 - dl.acm.org
Over the past few decades, large archives of paper-based historical documents, such as
books and newspapers, have been digitized using the Optical Character Recognition (OCR) …

A Combination of BERT and Transformer for Vietnamese Spelling Correction

TH Ngo, HD Tran, T Huynh, K Hoang - Asian Conference on Intelligent …, 2022 - Springer
Recently, many studies have shown the efficiency of using B idirectional E ncoder R
epresentations from T ransformers (BERT) in various Natural Language Processing (NLP) …

A Combination of BERT and Transformer for Vietnamese Spelling Correction

HN Trung, DT Ham, T Huynh, K Hoang - arXiv preprint arXiv:2405.02573, 2024 - arxiv.org
Recently, many studies have shown the efficiency of using Bidirectional Encoder
Representations from Transformers (BERT) in various Natural Language Processing (NLP) …

OCR error correction for Vietnamese handwritten text using neural machine translation

DQ Nguyen, AD Le, MN Phan, P Kromer… - AIP Conference …, 2021 - pubs.aip.org
OCR post-processing is an important step for improving the quality of OCR output texts.
Long short-term memory (LSTM) is a deep learning model, which has wide-range …

[HTML][HTML] Medication Extraction and Drug Interaction Chatbot: Generative Pretrained Transformer-Powered Chatbot for Drug-Drug Interaction

WT Kim, J Shin, IS Yoo, JW Lee, HJ Jeon… - Mayo Clinic …, 2024 - Elsevier
Objective To assist individuals, particularly cancer patients or those with complex
comorbidities, in quickly identifying potentially contraindicated medications when taking …

Statistical post-processing approaches for OCR texts

QD Nguyen, DA Le, NM Phan, NT Phan… - … of International Joint …, 2022 - Springer
Abstract Low-quality Optical Character Recognition systems often result in different kinds of
errors in OCR-generated texts. Hence, OCR error detection and correction are essential …