Video-llava: Learning united visual representation by alignment before projection B Lin, B Zhu, Y Ye, M Ning, P Jin, L Yuan arXiv preprint arXiv:2311.10122, 2023 | 119 | 2023 |
Languagebind: Extending video-language pretraining to n-modality by language-based semantic alignment B Zhu, B Lin, M Ning, Y Yan, J Cui, HF Wang, Y Pang, W Jiang, J Zhang, ... arXiv preprint arXiv:2310.01852, 2023 | 47 | 2023 |
Moe-llava: Mixture of experts for large vision-language models B Lin, Z Tang, Y Ye, J Cui, B Zhu, P Jin, J Zhang, M Ning, L Yuan arXiv preprint arXiv:2401.15947, 2024 | 36 | 2024 |
Video-bench: A comprehensive benchmark and toolkit for evaluating video-based large language models M Ning, B Zhu, Y Xie, B Lin, J Cui, L Yuan, D Chen, L Yuan arXiv preprint arXiv:2311.16103, 2023 | 12 | 2023 |
LLMBind: A unified modality-task integration framework B Zhu, P Jin, M Ning, B Lin, J Huang, Q Song, M Pan, L Yuan arXiv preprint arXiv:2402.14891, 2024 | 3 | 2024 |
Sharegpt4video: Improving video understanding and generation with better captions L Chen, X Wei, J Li, X Dong, P Zhang, Y Zang, Z Chen, H Duan, B Lin, ... arXiv preprint arXiv:2406.04325, 2024 | 2 | 2024 |
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators S Yuan, J Huang, Y Shi, Y Xu, R Zhu, B Lin, X Cheng, L Yuan, J Luo arXiv preprint arXiv:2404.05014, 2024 | 2 | 2024 |
BASALT refines binning from metagenomic data and increases resolution of genome-resolved metagenomic analysis Z Qiu, L Yuan, CA Lian, B Lin, J Chen, R Mu, X Qiao, L Zhang, Z Xu, L Fan, ... Nature Communications 15 (1), 2179, 2024 | 1 | 2024 |
UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark Z Zhou, Q Wang, B Lin, Y Su, R Chen, X Tao, A Zheng, L Yuan, P Wan, ... arXiv preprint arXiv:2404.09619, 2024 | | 2024 |