Ptseformer: Progressive temporal-spatial enhanced transformer towards video object detection H Wang, J Tang, X Liu, S Guan, R Xie, L Song European Conference on Computer Vision, 732-747, 2022 | 36 | 2022 |
Elysium: Exploring object-level perception in videos via mllm H Wang, Y Ye, Y Wang, Y Nie, C Huang European Conference on Computer Vision, 166-185, 2024 | 7 | 2024 |
A bounding box is worth one token: Interleaving layout and text in a large language model for document understanding J Lu, H Yu, Y Wang, Y Ye, J Tang, Z Yang, B Wu, Q Liu, H Feng, H Wang, ... arXiv preprint arXiv:2407.01976, 2024 | 1 | 2024 |
TACOMORE: Leveraging the Potential of LLMs in Corpus-based Discourse Analysis with Prompt Engineering B Li, H Wang arXiv preprint arXiv:2412.10139, 2024 | | 2024 |
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM H Wang, Y Nie, Y Ye, D GuanYu, Y Wang, S Li, H Yu, J Lu, C Huang arXiv preprint arXiv:2412.09530, 2024 | | 2024 |
GloTSFormer: Global Video Text Spotting Transformer H Wang, Y Wang, Y Li, C Huang arXiv preprint arXiv:2401.03694, 2024 | | 2024 |
A Large-scale Sports Tracking Dataset and Progressive Re-detection Based Sports Tracking H Wang, X Zhou, Q Xu, H Ren, R Xie, L Song 2022 IEEE International Conference on Visual Communications and Image …, 2022 | | 2022 |