Representation and correlation enhanced encoder-decoder framework for scene text recognition

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org

Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

被引用次数：423 相关文章所有 4 个版本

[PDF] arxiv.org

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer

Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

被引用次数：198 相关文章所有 6 个版本

[PDF] arxiv.org

Clip4str: A simple baseline for scene text recognition with pre-trained vision-language model

S Zhao, R Quan, L Zhu, Y Yang - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org

Pre-trained vision-language models (VLMs) are the de-facto foundation models for various
downstream tasks. However, scene text recognition methods still prefer backbones pre …

被引用次数：29 相关文章所有 2 个版本

[PDF] mdpi.com

Text recognition model based on multi-scale fusion CRNN

L Zou, Z He, K Wang, Z Wu, Y Wang, G Zhang, X Wang - Sensors, 2023 - mdpi.com

Scene text recognition is a crucial area of research in computer vision. However, current
mainstream scene text recognition models suffer from incomplete feature extraction due to …

被引用次数：7 相关文章所有 9 个版本

[PDF] researchgate.net

A review of optical text recognition from distorted scene image

OO Sumady, BJ Antoni, R Nasuta… - 2022 4th International …, 2022 - ieeexplore.ieee.org

The growing number of images with text taken from a natural position increases the amount
of text distortion. Some challenges come because of distortion, curvature, or blur which …

被引用次数：2 相关文章所有 2 个版本

Soft-edge-guided significant coordinate attention network for scene text image super-resolution

C Xi, K Zhang, X He, Y Hu, J Chen - The Visual Computer, 2024 - Springer

Scene text image super-resolution (STISR) aims to enhance the resolution and visual quality
of low-resolution scene text images, thereby improving the performance of some text-related …

被引用次数：2 相关文章

[PDF] mdpi.com

ViTSTR-Transducer: Cross-Attention-Free Vision Transformer Transducer for Scene Text Recognition

R Buoy, M Iwamura, S Srun, K Kise - Journal of Imaging, 2023 - mdpi.com

Attention-based encoder–decoder scene text recognition (STR) architectures have been
proven effective in recognizing text in the real world, thanks to their ability to learn an internal …

被引用次数：1 相关文章所有 7 个版本

Towards reduced-complexity scene text recognition (RCSTR) through a novel salient feature selection

R Buoy, M Iwamura, S Srun, K Kise - International Journal on Document …, 2024 - Springer

The integration of an attention mechanism has played a crucial role in many recent scene
text recognition (STR) methods. It enables the capture of spatial feature dependencies …

[PDF] arxiv.org

Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

B Bianchi, A Agrawal, S Dehaene, E Chemla… - arXiv preprint arXiv …, 2024 - arxiv.org

Human readers can accurately count how many letters are in a word (eg, 7 in``buffalo''),
remove a letter from a given position (eg,``bufflo'') or add a new one. The human brain of …

Parstr: partially autoregressive scene text recognition

R Buoy, M Iwamura, S Srun, K Kise - International Journal on Document …, 2024 - Springer

An autoregressive (AR) decoder for scene text recognition (STR) requires numerous
generation steps to decode a text image character by character but can yield high …

被引用次数：1 相关文章