Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models

P Xu, W Shao, K Zhang, P Gao, S Liu, M Lei… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Vision-Language Models (LVLMs) have recently played a dominant role in
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …

Trocr: Transformer-based optical character recognition with pre-trained models

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org
Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

On the hidden mystery of ocr in large multimodal models

Y Liu, Z Li, B Yang, C Li, X Yin, C Liu, L Jin… - arXiv preprint arXiv …, 2023 - arxiv.org
Large models have recently played a dominant role in natural language processing and
multimodal vision-language learning. However, their effectiveness in text-related visual …

Textdiffuser: Diffusion models as text painters

J Chen, Y Huang, T Lv, L Cui… - Advances in Neural …, 2024 - proceedings.neurips.cc
Diffusion models have gained increasing attention for their impressive generation abilities
but currently struggle with rendering accurate and coherent text. To address this issue, we …

Revisiting scene text recognition: A data perspective

Q Jiang, J Wang, D Peng, C Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective.
We begin by revisiting the six commonly used benchmarks in STR and observe a trend of …

Unichart: A universal vision-language pretrained model for chart comprehension and reasoning

A Masry, P Kavehzadeh, XL Do, E Hoque… - arXiv preprint arXiv …, 2023 - arxiv.org
Charts are very popular for analyzing data, visualizing key insights and answering complex
reasoning questions about data. To facilitate chart-based data analysis using natural …

Weakly supervised scene text generation for low-resource languages

Y Xie, X Chen, H Zhan, P Shivakumara, B Yin… - Expert Systems with …, 2024 - Elsevier
A large number of annotated training images is crucial for training successful scene text
recognition models. However, collecting sufficient datasets can be a labor-intensive and …

Cdistnet: Perceiving multi-domain character distance for robust text recognition

T Zheng, Z Chen, S Fang, H Xie, YG Jiang - International Journal of …, 2024 - Springer
The transformer-based encoder-decoder framework is becoming popular in scene text
recognition, largely because it naturally integrates recognition clues from both visual and …

LISTER: Neighbor decoding for length-insensitive scene text recognition

C Cheng, P Wang, C Da, Q Zheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
The diversity in length constitutes a significant characteristic of text. Due to the long-tail
distribution of text lengths, most existing methods for scene text recognition (STR) only work …

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

KH Huang, HP Chan, YR Fung, H Qiu, M Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical
insights and aiding in informed decision-making. Automatic chart understanding has …