Two stage multi-modal modeling for video interaction analysis in deep video understanding challenge

F Zhao, C Zhang, B Geng - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data
(eg, images, texts, or data collected from different sensors), feature engineering (eg …

被引用次数：12 相关文章

[PDF] archive.org

Shifted GCN-GAT and Cumulative-Transformer based Social Relation Recognition for Long Videos

H Wang, Y Hu, Y Zhu, J Qi, B Wu - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Social Relation Recognition is an important part of Video Understanding, providing insights
into the information that videos convey. Most previous works mainly focused on graph …

被引用次数：2 相关文章所有 2 个版本

Deep Video Understanding with Video-Language Model

R Liu, Y Fang, F Yu, R Tian, T Ren, G Wu - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Pre-trained video-language models (VLMs) have shown superior performance in high-level
video understanding tasks, analyzing multi-modal information, aligning with Deep Video …

[PDF] archive.org

A Hierarchical Deep Video Understanding Method with Shot-Based Instance Search and Large Language Model

R Li, J Guo, M Li, Z Wu, C Liang - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Deep video understanding (DVU) is often considered a challenge due to the aim of
interpreting a video with storyline, which is designed to solve two levels of problems …

A Deep Understanding Video Q&A System for Film Education in Acting Department

Z Wu, R Li, J Guo, Z Wang… - … Conference on Intelligent …, 2023 - ieeexplore.ieee.org

Recently, advancements in artificial intelligence technology have greatly influenced the field
of education, particularly in the area of intelligent homework assistance. However, current …

[PDF] nist.gov

[PDF][PDF] WHU-NERCMS AT TRECVID 2023: AD-HOC VEDIO SEARCH (AVS) AND DEEP VIDEO UNDERSTANDING (DVU) TASKS

J He, R Li, J Guo, H Zhang, M Li, Z Wu, Z Wang, B Du… - www-nlpir.nist.gov

The WHU-NERCMS team participated in the Ad-hoc Vedio Search (AVS) and Deep Video
Understanding (DVU) tasks at TRECVID 2023. For AVS task, we chose to utilize embedding …