Text recognition in the wild: A survey
The history of text can be traced back over thousands of years. Rich and precise semantic
information carried by text is important in a wide range of vision-based application …
information carried by text is important in a wide range of vision-based application …
How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
Deepseek-vl: towards real-world vision-language understanding
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …
world vision and language understanding applications. Our approach is structured around …
Text detection, recognition, and script identification in natural scene images: A Review
V Naosekpam, N Sahu - International Journal of Multimedia Information …, 2022 - Springer
Text in natural scene images plays a vital role in scene understanding. It contains a rich and
abundant amount of valuable semantic information useful in many applications such as …
abundant amount of valuable semantic information useful in many applications such as …
Scene text recognition with permuted autoregressive sequence models
D Bautista, R Atienza - European conference on computer vision, 2022 - Springer
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …
Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output
We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that
supports long-contextual input and output. IXC-2.5 excels in various text-image …
supports long-contextual input and output. IXC-2.5 excels in various text-image …
Swintextspotter: Scene text spotting via better synergy between text detection and text recognition
End-to-end scene text spotting has attracted great attention in recent years due to the
success of excavating the intrinsic synergy of the scene text detection and recognition …
success of excavating the intrinsic synergy of the scene text detection and recognition …
Abcnet: Real-time scene text spotting with adaptive bezier-curve network
Scene text detection and recognition has received increasing research attention. Existing
methods can be roughly categorized into two groups: character-based and segmentation …
methods can be roughly categorized into two groups: character-based and segmentation …
Turning a clip model into a scene text detector
The recent large-scale Contrastive Language-Image Pretraining (CLIP) model has shown
great potential in various downstream tasks via leveraging the pretrained vision and …
great potential in various downstream tasks via leveraging the pretrained vision and …
Towards end-to-end unified scene text detection and layout analysis
Scene text detection and document layout analysis have long been treated as two separate
tasks in different image domains. In this paper, we bring them together and introduce the …
tasks in different image domains. In this paper, we bring them together and introduce the …