Scene text recognition with permuted autoregressive sequence models

P Xu, W Shao, K Zhang, P Gao, S Liu, M Lei… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Vision-Language Models (LVLMs) have recently played a dominant role in
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …

被引用次数：118 相关文章所有 3 个版本

[PDF] aaai.org

Trocr: Transformer-based optical character recognition with pre-trained models

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org

Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

被引用次数：314 相关文章所有 4 个版本

[PDF] arxiv.org

On the hidden mystery of ocr in large multimodal models

Y Liu, Z Li, B Yang, C Li, X Yin, C Liu, L Jin… - arXiv preprint arXiv …, 2023 - arxiv.org

Large models have recently played a dominant role in natural language processing and
multimodal vision-language learning. However, their effectiveness in text-related visual …

被引用次数：89 相关文章所有 2 个版本

[PDF] neurips.cc

Textdiffuser: Diffusion models as text painters

J Chen, Y Huang, T Lv, L Cui… - Advances in Neural …, 2024 - proceedings.neurips.cc

Diffusion models have gained increasing attention for their impressive generation abilities
but currently struggle with rendering accurate and coherent text. To address this issue, we …

被引用次数：45 相关文章所有 5 个版本

[PDF] thecvf.com

Revisiting scene text recognition: A data perspective

Q Jiang, J Wang, D Peng, C Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective.
We begin by revisiting the six commonly used benchmarks in STR and observe a trend of …

被引用次数：23 相关文章所有 5 个版本

[PDF] arxiv.org

Unichart: A universal vision-language pretrained model for chart comprehension and reasoning

A Masry, P Kavehzadeh, XL Do, E Hoque… - arXiv preprint arXiv …, 2023 - arxiv.org

Charts are very popular for analyzing data, visualizing key insights and answering complex
reasoning questions about data. To facilitate chart-based data analysis using natural …

被引用次数：46 相关文章所有 5 个版本

[PDF] arxiv.org

Weakly supervised scene text generation for low-resource languages

Y Xie, X Chen, H Zhan, P Shivakumara, B Yin… - Expert Systems with …, 2024 - Elsevier

A large number of annotated training images is crucial for training successful scene text
recognition models. However, collecting sufficient datasets can be a labor-intensive and …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Cdistnet: Perceiving multi-domain character distance for robust text recognition

T Zheng, Z Chen, S Fang, H Xie, YG Jiang - International Journal of …, 2024 - Springer

The transformer-based encoder-decoder framework is becoming popular in scene text
recognition, largely because it naturally integrates recognition clues from both visual and …

被引用次数：50 相关文章所有 4 个版本

[PDF] thecvf.com

LISTER: Neighbor decoding for length-insensitive scene text recognition

C Cheng, P Wang, C Da, Q Zheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

The diversity in length constitutes a significant characteristic of text. Due to the long-tail
distribution of text lengths, most existing methods for scene text recognition (STR) only work …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

KH Huang, HP Chan, YR Fung, H Qiu, M Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Data visualization in the form of charts plays a pivotal role in data analysis, offering critical
insights and aiding in informed decision-making. Automatic chart understanding has …

被引用次数：3 相关文章所有 2 个版本