How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
In this report, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
Towards robust tampered text detection in document image: New dataset and new solution
Recently, tampered text detection in document image has attracted increasingly attention
due to its essential role on information security. However, detecting visually consistent …
due to its essential role on information security. However, detecting visually consistent …
Modeling entities as semantic points for visual information extraction in the wild
Abstract Recently, Visual Information Extraction (VIE) has been becoming increasingly
important in both academia and industry, due to the wide range of real-world applications …
important in both academia and industry, due to the wide range of real-world applications …
Uav localization in low-altitude gnss-denied environments based on poi and store signage text matching in uav images
Y Liu, J Bai, G Wang, X Wu, F Sun, Z Guo, H Geng - Drones, 2023 - mdpi.com
Localization is the most important basic information for unmanned aerial vehicles (UAV)
during their missions. Currently, most UAVs use GNSS to calculate their own position …
during their missions. Currently, most UAVs use GNSS to calculate their own position …
Bridging the Gap Between End-to-End and Two-Step Text Spotting
Modularity plays a crucial role in the development and maintenance of complex systems.
While end-to-end text spotting efficiently mitigates the issues of error accumulation and sub …
While end-to-end text spotting efficiently mitigates the issues of error accumulation and sub …
Context perception parallel decoder for scene text recognition
Scene text recognition (STR) methods have struggled to attain high accuracy and fast
inference speed. Autoregressive (AR)-based STR model uses the previously recognized …
inference speed. Autoregressive (AR)-based STR model uses the previously recognized …
Toward real text manipulation detection: New dataset and new solution
With the surge in realistic text tampering, detecting fraudulent text in images has gained
prominence for maintaining information security. However, the high costs associated with …
prominence for maintaining information security. However, the high costs associated with …
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
In recent years text-image joint pre-training techniques have shown promising results in
various tasks. However in Optical Character Recognition (OCR) tasks aligning text instances …
various tasks. However in Optical Character Recognition (OCR) tasks aligning text instances …
A Comprehensive Framework for Industrial Sticker Information Recognition Using Advanced OCR and Object Detection Techniques
Recent advancements in Artificial Intelligence (AI), deep learning (DL), and computer vision
have revolutionized various industrial processes through image classification and object …
have revolutionized various industrial processes through image classification and object …
ESGNet: A multimodal network model incorporating entity semantic graphs for information extraction from Chinese resumes
S Luo, J Yu - Information Processing & Management, 2024 - Elsevier
Corporations require screening critical information from numerous resumes with different
formats and content for managerial decision-making. However, traditional manual screening …
formats and content for managerial decision-making. However, traditional manual screening …