The newspaper navigator dataset: extracting and analyzing visual content from 16 million historic newspaper pages in chronicling America
BCG Lee, J Mears, E Jakeway, M Ferriter… - arXiv preprint arXiv …, 2020 - arxiv.org
Chronicling America is a product of the National Digital Newspaper Program, a partnership
between the Library of Congress and the National Endowment for the Humanities to digitize …
between the Library of Congress and the National Endowment for the Humanities to digitize …
An Efficient Unsupervised Approach for OCR Error Correction of Vietnamese OCR Text
Different types of OCR errors often occur in OCR texts due to the low quality of scanned
document images or limitations in OCR software. In this paper, we propose a novel …
document images or limitations in OCR software. In this paper, we propose a novel …
OCR error correction using correction patterns and self-organizing migrating algorithm
QD Nguyen, DA Le, NM Phan, I Zelinka - Pattern Analysis and …, 2021 - Springer
Optical character recognition (OCR) systems help to digitize paper-based historical
achieves. However, poor quality of scanned documents and limitations of text recognition …
achieves. However, poor quality of scanned documents and limitations of text recognition …
Vsec: Transformer-based model for vietnamese spelling correction
Spelling error correction is one of topics which have a long history in natural language
processing. Although previous studies have achieved remarkable results, challenges still …
processing. Although previous studies have achieved remarkable results, challenges still …
Toward a period-specific optimized neural network for OCR error correction of historical Hebrew texts
O Suissa, M Zhitomirsky-Geffet… - ACM Journal on …, 2022 - dl.acm.org
Over the past few decades, large archives of paper-based historical documents, such as
books and newspapers, have been digitized using the Optical Character Recognition (OCR) …
books and newspapers, have been digitized using the Optical Character Recognition (OCR) …
A Combination of BERT and Transformer for Vietnamese Spelling Correction
Recently, many studies have shown the efficiency of using B idirectional E ncoder R
epresentations from T ransformers (BERT) in various Natural Language Processing (NLP) …
epresentations from T ransformers (BERT) in various Natural Language Processing (NLP) …
A Combination of BERT and Transformer for Vietnamese Spelling Correction
HN Trung, DT Ham, T Huynh, K Hoang - arXiv preprint arXiv:2405.02573, 2024 - arxiv.org
Recently, many studies have shown the efficiency of using Bidirectional Encoder
Representations from Transformers (BERT) in various Natural Language Processing (NLP) …
Representations from Transformers (BERT) in various Natural Language Processing (NLP) …
OCR error correction for Vietnamese handwritten text using neural machine translation
OCR post-processing is an important step for improving the quality of OCR output texts.
Long short-term memory (LSTM) is a deep learning model, which has wide-range …
Long short-term memory (LSTM) is a deep learning model, which has wide-range …
[HTML][HTML] Medication Extraction and Drug Interaction Chatbot: Generative Pretrained Transformer-Powered Chatbot for Drug-Drug Interaction
WT Kim, J Shin, IS Yoo, JW Lee, HJ Jeon… - Mayo Clinic …, 2024 - Elsevier
Objective To assist individuals, particularly cancer patients or those with complex
comorbidities, in quickly identifying potentially contraindicated medications when taking …
comorbidities, in quickly identifying potentially contraindicated medications when taking …
Statistical post-processing approaches for OCR texts
Abstract Low-quality Optical Character Recognition systems often result in different kinds of
errors in OCR-generated texts. Hence, OCR error detection and correction are essential …
errors in OCR-generated texts. Hence, OCR error detection and correction are essential …