STAR: A Benchmark for Situated Reasoning in Real-world Videos B Wu, S Yu, Z Chen, JB Tenenbaum, C Gan NeurIPS, 2021 | 107 | 2021 |
Self-chained Image-Language Model for Video Localization and Question Answering S Yu, J Cho, P Yadav, M Bansal NeurIPS, 2023 | 66 | 2023 |
A Simple LLM Framework for Long-range Video Question-Answering C Zhang, T Lu, MM Islam, Z Wang, S Yu, M Bansal, G Bertasius arXiv preprint arXiv:2312.17235, 2023 | 20 | 2023 |
Regularity Learning via Explicit Distribution Modeling for Skeletal Video Anomaly Detection S Yu, Z Zhao, H Fang, A Deng, H Su, D Wang, W Gan, C Lu, W Wu IEEE TCSVT, 2023 | 8 | 2023 |
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos Z Wang*, S Yu*, E Stengel-Eskin*, J Yoon, F Cheng, G Bertasius, ... arXiv preprint arXiv:2405.19209, 2024 | 2 | 2024 |
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion S Yu, J Yoon, M Bansal arXiv preprint arXiv:2402.05889, 2024 | 2* | 2024 |
RACCooN: Remove, Add, and Change Video Content with Auto-Generated Narratives J Yoon*, S Yu*, M Bansal arXiv preprint arXiv:2405.18406, 2024 | | 2024 |