Trocr: Transformer-based optical character recognition with pre-trained models

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org
Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

Clip4str: A simple baseline for scene text recognition with pre-trained vision-language model

S Zhao, R Quan, L Zhu, Y Yang - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org
Pre-trained vision-language models (VLMs) are the de-facto foundation models for various
downstream tasks. However, scene text recognition methods still prefer backbones pre …

Text recognition model based on multi-scale fusion CRNN

L Zou, Z He, K Wang, Z Wu, Y Wang, G Zhang, X Wang - Sensors, 2023 - mdpi.com
Scene text recognition is a crucial area of research in computer vision. However, current
mainstream scene text recognition models suffer from incomplete feature extraction due to …

A review of optical text recognition from distorted scene image

OO Sumady, BJ Antoni, R Nasuta… - 2022 4th International …, 2022 - ieeexplore.ieee.org
The growing number of images with text taken from a natural position increases the amount
of text distortion. Some challenges come because of distortion, curvature, or blur which …

Soft-edge-guided significant coordinate attention network for scene text image super-resolution

C Xi, K Zhang, X He, Y Hu, J Chen - The Visual Computer, 2024 - Springer
Scene text image super-resolution (STISR) aims to enhance the resolution and visual quality
of low-resolution scene text images, thereby improving the performance of some text-related …

ViTSTR-Transducer: Cross-Attention-Free Vision Transformer Transducer for Scene Text Recognition

R Buoy, M Iwamura, S Srun, K Kise - Journal of Imaging, 2023 - mdpi.com
Attention-based encoder–decoder scene text recognition (STR) architectures have been
proven effective in recognizing text in the real world, thanks to their ability to learn an internal …

Towards reduced-complexity scene text recognition (RCSTR) through a novel salient feature selection

R Buoy, M Iwamura, S Srun, K Kise - International Journal on Document …, 2024 - Springer
The integration of an attention mechanism has played a crucial role in many recent scene
text recognition (STR) methods. It enables the capture of spatial feature dependencies …

Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

B Bianchi, A Agrawal, S Dehaene, E Chemla… - arXiv preprint arXiv …, 2024 - arxiv.org
Human readers can accurately count how many letters are in a word (eg, 7 in``buffalo''),
remove a letter from a given position (eg,``bufflo'') or add a new one. The human brain of …

Parstr: partially autoregressive scene text recognition

R Buoy, M Iwamura, S Srun, K Kise - International Journal on Document …, 2024 - Springer
An autoregressive (AR) decoder for scene text recognition (STR) requires numerous
generation steps to decode a text image character by character but can yield high …