Trocr: Transformer-based optical character recognition with pre-trained models
Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …
approaches are usually built based on CNN for image understanding and RNN for char …
Scene text recognition with permuted autoregressive sequence models
D Bautista, R Atienza - European conference on computer vision, 2022 - Springer
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
Clip4str: A simple baseline for scene text recognition with pre-trained vision-language model
Pre-trained vision-language models (VLMs) are the de-facto foundation models for various
downstream tasks. However, scene text recognition methods still prefer backbones pre …
downstream tasks. However, scene text recognition methods still prefer backbones pre …
Text recognition model based on multi-scale fusion CRNN
L Zou, Z He, K Wang, Z Wu, Y Wang, G Zhang, X Wang - Sensors, 2023 - mdpi.com
Scene text recognition is a crucial area of research in computer vision. However, current
mainstream scene text recognition models suffer from incomplete feature extraction due to …
mainstream scene text recognition models suffer from incomplete feature extraction due to …
A review of optical text recognition from distorted scene image
OO Sumady, BJ Antoni, R Nasuta… - 2022 4th International …, 2022 - ieeexplore.ieee.org
The growing number of images with text taken from a natural position increases the amount
of text distortion. Some challenges come because of distortion, curvature, or blur which …
of text distortion. Some challenges come because of distortion, curvature, or blur which …
Soft-edge-guided significant coordinate attention network for scene text image super-resolution
C Xi, K Zhang, X He, Y Hu, J Chen - The Visual Computer, 2024 - Springer
Scene text image super-resolution (STISR) aims to enhance the resolution and visual quality
of low-resolution scene text images, thereby improving the performance of some text-related …
of low-resolution scene text images, thereby improving the performance of some text-related …
ViTSTR-Transducer: Cross-Attention-Free Vision Transformer Transducer for Scene Text Recognition
Attention-based encoder–decoder scene text recognition (STR) architectures have been
proven effective in recognizing text in the real world, thanks to their ability to learn an internal …
proven effective in recognizing text in the real world, thanks to their ability to learn an internal …
Towards reduced-complexity scene text recognition (RCSTR) through a novel salient feature selection
The integration of an attention mechanism has played a crucial role in many recent scene
text recognition (STR) methods. It enables the capture of spatial feature dependencies …
text recognition (STR) methods. It enables the capture of spatial feature dependencies …
Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models
Human readers can accurately count how many letters are in a word (eg, 7 in``buffalo''),
remove a letter from a given position (eg,``bufflo'') or add a new one. The human brain of …
remove a letter from a given position (eg,``bufflo'') or add a new one. The human brain of …
Parstr: partially autoregressive scene text recognition
An autoregressive (AR) decoder for scene text recognition (STR) requires numerous
generation steps to decode a text image character by character but can yield high …
generation steps to decode a text image character by character but can yield high …