关注
kun yao
kun yao
在 baidu.com 的电子邮件经过验证
标题
引用次数
引用次数
年份
Structext: Structured text understanding with multi-modal transformers
Y Li, Y Qian, Y Yu, X Qin, C Zhang, Y Liu, K Yao, J Han, J Liu, E Ding
Proceedings of the 29th ACM International Conference on Multimedia, 1912-1920, 2021
1152021
Group detr: Fast detr training with group-wise one-to-many assignment
Q Chen, X Chen, J Wang, S Zhang, K Yao, H Feng, J Han, E Ding, G Zeng, ...
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
852023
Vista: Vision and scene text aggregation for cross-modal retrieval
M Cheng, Y Sun, L Wang, X Zhu, K Yao, J Chen, G Song, J Han, J Liu, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022
682022
Maskocr: Text recognition with masked encoder-decoder pretraining
P Lyu, C Zhang, S Liu, M Qiao, Y Xu, L Wu, K Yao, J Han, E Ding, J Wang
arXiv preprint arXiv:2206.00311, 2022
382022
Structextv2: Masked visual-textual prediction for document image pre-training
Y Yu, Y Li, C Zhang, X Zhang, Z Guo, X Qin, K Yao, J Han, E Ding, J Wang
arXiv preprint arXiv:2303.00289, 2023
372023
Cae v2: Context autoencoder with clip target
X Zhang, J Chen, J Yuan, Q Chen, J Wang, X Wang, S Han, X Chen, J Pi, ...
arXiv preprint arXiv:2211.09799, 2022
232022
Decoupling recognition from detection: Single shot self-reliant scene text spotter
J Wu, P Lyu, G Lu, C Zhang, K Yao, W Pei
Proceedings of the 30th ACM International Conference on Multimedia, 1319-1328, 2022
212022
Group pose: A simple baseline for end-to-end multi-person pose estimation
H Liu, Q Chen, Z Tan, JJ Liu, J Wang, X Su, X Li, K Yao, J Han, E Ding, ...
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
182023
Learning structure-guided diffusion model for 2d human pose estimation
Z Qiu, Q Yang, J Wang, X Wang, C Xu, D Fu, K Yao, J Han, E Ding, ...
arXiv preprint arXiv:2306.17074, 2023
132023
Froster: Frozen clip is a strong teacher for open-vocabulary action recognition
X Huang, H Zhou, K Yao, K Han
arXiv preprint arXiv:2402.03241, 2024
102024
Towards robust real-time scene text detection: From semantic to instance representation learning
X Qin, P Lyu, C Zhang, Y Zhou, K Yao, P Zhang, H Lin, W Wang
Proceedings of the 31st ACM International Conference on Multimedia, 2025-2034, 2023
72023
Trust: An accurate and end-to-end table structure recognizer using splitting-based transformers
Z Guo, Y Yu, P Lv, C Zhang, H Li, Z Wang, K Yao, J Liu, J Wang
arXiv preprint arXiv:2208.14687, 2022
72022
Hap: Structure-aware masked image modeling for human-centric perception
J Yuan, X Zhang, H Zhou, J Wang, Z Qiu, Z Shao, S Zhang, S Long, ...
Advances in Neural Information Processing Systems 36, 2024
62024
Gridformer: Towards accurate table structure recognition via grid prediction
P Lyu, W Ma, H Wang, Y Yu, C Zhang, K Yao, Y Xue, J Wang
Proceedings of the 31st ACM International Conference on Multimedia, 7747-7757, 2023
52023
Fast-StrucTexT: An efficient hourglass transformer with modality-guided dynamic token merge for document understanding
M Zhai, Y Li, X Qin, C Yi, Q Xie, C Zhang, K Yao, Y Wu, Y Jia
arXiv preprint arXiv:2305.11392, 2023
52023
Icdar 2023 competition on structured text extraction from visually-rich document images
W Yu, C Zhang, H Cao, W Hua, B Li, H Chen, M Liu, M Chen, J Kuang, ...
International Conference on Document Analysis and Recognition, 536-552, 2023
42023
CAE v2: Context autoencoder with CLIP latent alignment
X Zhang, J Chen, J Yuan, Q Chen, J Wang, X Wang, S Han, X Chen, J Pi, ...
Transactions on Machine Learning Research, 2023
42023
Matadoc: margin and text aware document dewarping for arbitrary boundary
B Dai, Q Xie, Y Li, X Qin, C Zhang, K Yao, J Han
arXiv preprint arXiv:2307.12571, 2023
22023
OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer
Y Wang, X Su, Q Chen, X Zhang, T Xi, K Yao, E Ding, G Zhang, J Wang
arXiv preprint arXiv:2407.10655, 2024
12024
LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Q Chen, X Su, X Zhang, J Wang, J Chen, Y Shen, C Han, Z Chen, W Xu, ...
arXiv preprint arXiv:2406.03459, 2024
12024
系统目前无法执行此操作,请稍后再试。
文章 1–20