Still not there? comparing traditional sequence-to-sequence models to encoder-decoder neural...

TTH Nguyen, A Jatowt, M Coustaty… - ACM Computing Surveys …, 2021 - dl.acm.org

Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …

被引用次数：142 相关文章所有 4 个版本

[PDF] jair.org

Neural machine translation: A review

F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org

The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …

被引用次数：386 相关文章所有 7 个版本

[PDF] arxiv.org

Text processing like humans do: Visually attacking and shielding NLP systems

S Eger, GG Şahin, A Rücklé, JU Lee, C Schulz… - arXiv preprint arXiv …, 2019 - arxiv.org

Visual modifications to text are often used to obfuscate offensive comments in social media
(eg,"! d10t") or as a writing style (" 1337" in" leet speak"), among other scenarios. We …

被引用次数：173 相关文章所有 8 个版本

[PDF] arxiv.org

Morphological inflection generation with hard monotonic attention

R Aharoni, Y Goldberg - arXiv preprint arXiv:1611.01487, 2016 - arxiv.org

We present a neural model for morphological inflection generation which employs a hard
attention mechanism, inspired by the nearly-monotonic alignment commonly found between …

被引用次数：143 相关文章所有 7 个版本

[PDF] aclanthology.org

Reducing sequence length by predicting edit spans with large language models

M Kaneko, N Okazaki - Proceedings of the 2023 Conference on …, 2023 - aclanthology.org

Abstract Large Language Models (LLMs) have demonstrated remarkable performance in
various tasks and gained significant attention. LLMs are also used for local sequence …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

OCR post correction for endangered language texts

S Rijhwani, A Anastasopoulos, G Neubig - arXiv preprint arXiv …, 2020 - arxiv.org

There is little to no data available to build natural language processing models for most
endangered languages. However, textual data in these languages often exists in formats …

被引用次数：51 相关文章所有 4 个版本

[PDF] uzh.ch

Supervised OCR error detection and correction using statistical and neural machine translation methods

C Amrhein, S Clematide - Journal for Language Technology and …, 2018 - zora.uzh.ch

For indexing the content of digitized historical texts, optical character recognition (OCR)
errors are a hampering problem. To explore the effectivity of new strategies for OCR post …

被引用次数：62 相关文章所有 4 个版本

[PDF] aclanthology.org

Multi-input attention for unsupervised OCR correction

R Dong, DA Smith - Proceedings of the 56th Annual Meeting of …, 2018 - aclanthology.org

We propose a novel approach to OCR post-correction that exploits repeated texts in large
corpora both as a source of noisy target outputs for unsupervised training and as a source of …

被引用次数：53 相关文章所有 4 个版本

[PDF] mit.edu

Neural OCR post-hoc correction of historical corpora

L Lyu, M Koutraki, M Krickl, B Fetahu - Transactions of the Association …, 2021 - direct.mit.edu

Optical character recognition (OCR) is crucial for a deeper access to historical collections.
OCR needs to account for orthographic variations, typefaces, or language evolution (ie, new …

被引用次数：26 相关文章所有 11 个版本

[PDF] mit.edu

Lexically aware semi-supervised learning for OCR post-correction

S Rijhwani, D Rosenblum… - Transactions of the …, 2021 - direct.mit.edu

Much of the existing linguistic data in many languages of the world is locked away in non-
digitized books and documents. Optical character recognition (OCR) can be used to produce …

被引用次数：17 相关文章所有 9 个版本