Seeclick: Harnessing gui grounding for advanced visual gui agents K Cheng, Q Sun, Y Chu, F Xu, Y Li, J Zhang, Z Wu arXiv preprint arXiv:2401.10935, 2024 | 24 | 2024 |
Beyond generic: Enhancing image captioning with real-world knowledge using vision-language pre-training model K Cheng, W Song, Z Ma, W Zhu, Z Zhu, J Zhang Proceedings of the 31st ACM International Conference on Multimedia, 5038-5047, 2023 | 5 | 2023 |
A survey of neural code intelligence: Paradigms, advances and beyond Q Sun, Z Chen, F Xu, K Cheng, C Ma, Z Yin, J Wang, C Han, R Zhu, ... arXiv preprint arXiv:2403.14734, 2024 | 4 | 2024 |
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models Z Ma, M Pan, W Wu, K Cheng, J Zhang, S Huang, J Chen Proceedings of the 31st ACM International Conference on Multimedia, 5674-5685, 2023 | 2 | 2023 |
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models F Xu, Q Sun, K Cheng, J Liu, Y Qiao, Z Wu arXiv preprint arXiv:2406.11736, 2024 | 1 | 2024 |
ADS-Cap: A framework for accurate and diverse stylized captioning with unpaired stylistic corpora K Cheng, Z Ma, S Zong, J Zhang, X Dai, J Chen CCF International Conference on Natural Language Processing and Chinese …, 2022 | 1 | 2022 |
Probing Commonsense Reasoning Capability of Text-to-Image Generative Models via Non-visual Description M Pan, J Li, M Yu, Z Ma, K Cheng, J Zhang, J Chen arXiv preprint arXiv:2312.07294, 2023 | | 2023 |