Vision transformer for fast and efficient scene text recognition

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org

Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

被引用次数：345 相关文章所有 4 个版本

Text detection, recognition, and script identification in natural scene images: A Review

V Naosekpam, N Sahu - International Journal of Multimedia Information …, 2022 - Springer

Text in natural scene images plays a vital role in scene understanding. It contains a rich and
abundant amount of valuable semantic information useful in many applications such as …

被引用次数：28 相关文章

[PDF] arxiv.org

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer

Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

被引用次数：158 相关文章所有 6 个版本

[PDF] arxiv.org

Svtr: Scene text recognition with a single visual model

Y Du, Z Chen, C Jia, X Yin, T Zheng, C Li, Y Du… - arXiv preprint arXiv …, 2022 - arxiv.org

Dominant scene text recognition models commonly contain two building blocks, a visual
model for feature extraction and a sequence model for text transcription. This hybrid …

被引用次数：161 相关文章所有 5 个版本

[PDF] arxiv.org

Multi-granularity prediction for scene text recognition

P Wang, C Da, C Yao - European Conference on Computer Vision, 2022 - Springer

Scene text recognition (STR) has been an active research topic in computer vision for years.
To tackle this challenging problem, numerous innovative methods have been successively …

被引用次数：58 相关文章所有 5 个版本

[PDF] arxiv.org

Dan: a segmentation-free document attention network for handwritten document recognition

D Coquenet, C Chatelain… - IEEE transactions on …, 2023 - ieeexplore.ieee.org

Unconstrained handwritten text recognition is a challenging computer vision task. It is
traditionally handled by a two-step approach, combining line segmentation followed by text …

被引用次数：77 相关文章所有 10 个版本

[PDF] arxiv.org

Multi-modal text recognition networks: Interactive enhancements between visual and semantic features

B Na, Y Kim, S Park - European Conference on Computer Vision, 2022 - Springer

Linguistic knowledge has brought great benefits to scene text recognition by providing
semantics to refine character sequences. However, since linguistic knowledge has been …

被引用次数：58 相关文章所有 5 个版本

[PDF] arxiv.org

Levenshtein ocr

C Da, P Wang, C Yao - European Conference on Computer Vision, 2022 - Springer

A novel scene text recognizer based on Vision-Language Transformer (VLT) is presented.
Inspired by Levenshtein Transformer in the area of NLP, the proposed method (named …

被引用次数：36 相关文章所有 5 个版本

[PDF] thecvf.com

LISTER: Neighbor decoding for length-insensitive scene text recognition

C Cheng, P Wang, C Da, Q Zheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

The diversity in length constitutes a significant characteristic of text. Due to the long-tail
distribution of text lengths, most existing methods for scene text recognition (STR) only work …

被引用次数：13 相关文章所有 5 个版本

[PDF] thecvf.com

Dtrocr: Decoder-only transformer for optical character recognition

M Fujitake - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com

Typical text recognition methods rely on an encoder-decoder structure, in which the encoder
extracts features from an image, and the decoder produces recognized text from these …

被引用次数：23 相关文章所有 7 个版本