Advancing high-resolution video-language representation with large-scale video transcriptions H Xue, T Hang, Y Zeng, Y Sun, B Liu, H Yang, J Fu, B Guo Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 135 | 2022 |
Clip-vip: Adapting pre-trained image-text model to video-language representation alignment H Xue, Y Sun, B Liu, J Fu, R Song, H Li, J Luo arXiv preprint arXiv:2209.06430, 2022 | 116* | 2022 |
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training H Xue, Y Huang, B Liu, H Peng, J Fu, H Li, J Luo Advances in Neural Information Processing Systems 34, 2021 | 83 | 2021 |
Unifying multimodal transformer for bi-directional image and text generation Y Huang, H Xue, B Liu, Y Lu Proceedings of the 29th ACM International Conference on Multimedia, 1138-1147, 2021 | 56 | 2021 |
Long-form video-language pre-training with multimodal temporal contrastive learning Y Sun, H Xue, R Song, B Liu, H Yang, J Fu Advances in neural information processing systems 35, 38032-38045, 2022 | 54 | 2022 |
Stare at what you see: Masked image modeling without reconstruction H Xue, P Gao, H Li, Y Qiao, H Sun, H Li, J Luo Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 25* | 2023 |
Learning fine-grained motion embedding for landscape animation H Xue, B Liu, H Yang, J Fu, H Li, J Luo Proceedings of the 29th ACM International Conference on Multimedia (Oral …, 2021 | 6 | 2021 |
Semantic Tag Augmented XlanV Model for Video Captioning Y Huang*, H Xue*, J Chen, H Ma, H Ma Proceedings of the 29th ACM International Conference on Multimedia, 4818-4822, 2021 | 5 | 2021 |
Sed-net: detecting multi-type edits of images H Xue, H Liu, J Li, H Li, J Luo 2020 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2020 | 3 | 2020 |
Visual Perception by Large Language Model's Weights F Ma, H Xue, G Wang, Y Zhou, F Rao, S Yan, Y Zhang, S Wu, MZ Shou, ... arXiv preprint arXiv:2405.20339, 2024 | | 2024 |
Multi-Modal Generative Embedding Model F Ma, H Xue, G Wang, Y Zhou, F Rao, S Yan, Y Zhang, S Wu, MZ Shou, ... arXiv preprint arXiv:2405.19333, 2024 | | 2024 |
Supplementary Material: Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning Y Sun, H Xue, R Song, B Liu, H Yang, J Fu | | |
Supplementary Material: Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions H Xue, T Hang, Y Zeng, Y Sun, B Liu, H Yang, J Fu, B Guo | | |