Video-based emotion recognition using CNN-RNN and C3D hybrid networks Y Fan, X Lu, D Li, Y Liu Proceedings of the 18th ACM international conference on multimodal …, 2016 | 667 | 2016 |
Bridging video-text retrieval with multiple choice questions Y Ge, Y Ge, X Liu, D Li, Y Shan, X Qie, P Luo Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 151 | 2022 |
Clip4caption: Clip for video caption M Tang, Z Wang, Z Liu, F Rao, D Li, X Li Proceedings of the 29th ACM International Conference on Multimedia, 4858-4862, 2021 | 123 | 2021 |
Transform domain transcoding from MPEG-2 to H. 264 with interpolation drift-error compensation T Qian, J Sun, D Li, X Yang, J Wang IEEE Transactions on Circuits and Systems for Video Technology 16 (4), 523-534, 2006 | 45 | 2006 |
Masked image modeling with denoising contrast K Yi, Y Ge, X Li, S Yang, D Li, J Wu, Y Shan, X Qie arXiv preprint arXiv:2205.09616, 2022 | 37 | 2022 |
Learning scale-consistent attention part network for fine-grained image recognition H Liu, J Li, D Li, J See, W Lin IEEE Transactions on Multimedia 24, 2902-2913, 2021 | 29 | 2021 |
Enhancing self-supervised video representation learning via multi-level feature optimization R Qian, Y Li, H Liu, J See, S Ding, X Liu, D Li, W Lin Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 28 | 2021 |
Taggpt: Large language models are zero-shot multimodal taggers C Li, Y Ge, J Mao, D Li, Y Shan arXiv preprint arXiv:2304.03022, 2023 | 17 | 2023 |
Ca-ssl: Class-agnostic semi-supervised learning for detection and segmentation L Qi, J Kuen, Z Lin, J Gu, F Rao, D Li, W Guo, Z Wen, MH Yang, J Jia European Conference on Computer Vision, 59-77, 2022 | 13 | 2022 |
Tencent-mvse: A large-scale benchmark dataset for multi-modal video similarity evaluation Z Zeng, Y Luo, Z Liu, F Rao, D Li, W Guo, Z Wen Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 10 | 2022 |
Rils: Masked visual reconstruction in language semantic space S Yang, Y Ge, K Yi, D Li, Y Shan, X Qie, X Wang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 8* | 2023 |
Clip4caption++: Multi-clip for video caption M Tang, Z Wang, Z Zeng, F Rao, D Li arXiv preprint arXiv:2110.05204, 2021 | 8 | 2021 |
Vision-language instruction tuning: A review and analysis C Li, Y Ge, D Li, Y Shan Transactions on Machine Learning Research, 2023 | 5 | 2023 |
Transform domain transcoding from MPEG-2 to H. 264 with interpolation error drift compensation T Qian, J Sun, D Li, X Yang, J Wang IEEE Workshop on Signal Processing Systems Design and Implementation, 2005 …, 2005 | 4 | 2005 |
Unified Pretraining Target Based Video-music Retrieval With Music Rhythm And Video Optical Flow Information T Mao, S Liu, Y Zhang, D Li, Y Shan ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 2 | 2024 |
Controllable augmentations for video representation learning R Qian, W Lin, J See, D Li Visual Intelligence 2 (1), 1, 2024 | 2 | 2024 |
Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos J Fei, D Li, Z Deng, Z Wang, G Liu, H Wang arXiv preprint arXiv:2408.14023, 2024 | 1 | 2024 |
MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation X Wang, D Li, Y Zhao, H Wang arXiv preprint arXiv:2407.12871, 2024 | | 2024 |
Humtrans: A Novel Open-Source Dataset for Humming Melody Transcription and Beyond S Liu, X Li, D Li, Y Shan ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | | 2024 |