Few could be better than all: Feature sampling and grouping for scene text detection J Tang, W Zhang, H Liu, MK Yang, B Jiang, G Hu, X Bai Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 73 | 2022 |
Spts v2: single-point scene text spotting Y Liu, J Zhang, D Peng, M Huang, X Wang, J Tang, C Huang, D Lin, ... IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023 | 27 | 2023 |
Docpedia: Unleashing the power of large multimodal model in the frequency domain for versatile document understanding H Feng, Q Liu, H Liu, W Zhou, H Li, C Huang arXiv preprint arXiv:2311.11810, 2023 | 22 | 2023 |
Unidoc: A universal large multimodal model for simultaneous text detection, recognition, spotting and understanding H Feng, Z Wang, J Tang, J Lu, W Zhou, H Li, C Huang arXiv preprint arXiv:2308.11592, 2023 | 19 | 2023 |
You can even annotate text with voice: Transcription-only-supervised text spotting J Tang, S Qiao, B Cui, Y Ma, S Zhang, D Kanoulas Proceedings of the 30th ACM International Conference on Multimedia, 4154-4163, 2022 | 11 | 2022 |
TextSquare: Scaling up Text-Centric Visual Instruction Tuning J Tang, C Lin, Z Zhao, S Wei, B Wu, Q Liu, H Feng, Y Li, S Wang, L Liao, ... arXiv preprint arXiv:2404.12803, 2024 | 5 | 2024 |
Optimal boxes: boosting end-to-end scene text recognition by adjusting annotated bounding boxes via reinforcement learning J Tang, W Qian, L Song, X Dong, L Li, X Bai European Conference on Computer Vision, 233-248, 2022 | 3 | 2022 |
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering J Tang, Q Liu, Y Ye, J Lu, S Wei, C Lin, W Li, MFFB Mahmood, H Feng, ... arXiv preprint arXiv:2405.11985, 2024 | 2 | 2024 |
Harmonizing Visual Text Comprehension and Generation Z Zhao, J Tang, B Wu, C Lin, S Wei, H Liu, X Tan, Z Zhang, C Huang, ... arXiv preprint arXiv:2407.16364, 2024 | | 2024 |
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding J Lu, H Yu, Y Wang, Y Ye, J Tang, Z Yang, B Wu, Q Liu, H Feng, H Wang, ... arXiv preprint arXiv:2407.01976, 2024 | | 2024 |
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy W Zhao, H Feng, Q Liu, J Tang, S Wei, B Wu, L Liao, Y Ye, H Liu, H Li, ... arXiv preprint arXiv:2406.01326, 2024 | | 2024 |
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer Z Zhao, C Huang, B Wu, C Lin, H Liu, Z Zhang, X Tan, J Tang, Y Xie (CVPR2024)arXiv preprint arXiv:2311.13120, 2023 | | 2023 |
Character recognition competition for street view shop signs J Tang, W Du, B Wang, W Zhou, S Mei, T Xue, X Xu, H Zhang National Science Review 10 (6), nwad141, 2023 | | 2023 |