Two-stream transformer architecture for long video understanding

E Fish, J Weinbren, A Gilbert - arXiv preprint arXiv:2208.01753, 2022 - arxiv.org
Pure vision transformer architectures are highly effective for short video classification and
action recognition tasks. However, due to the quadratic complexity of self attention and lack …

Hierarchical few-shot learning based on coarse-and fine-grained relation network

Z Wu, H Zhao - Artificial Intelligence Review, 2023 - Springer
Few-shot learning plays an important role in the field of machine learning. Many existing
methods based on relation network achieve satisfactory results. However, these methods …

Incorporating domain knowledge graph into multimodal movie genre classification with self-supervised attention and contrastive learning

J Li, G Qi, C Zhang, Y Chen, Y Tan, C Xia… - Proceedings of the 31st …, 2023 - dl.acm.org
Multimodal movie genre classification has always been regarded as a demanding multi-
label classification task due to the diversity of multimodal data such as posters, plot …

A unified framework to catalogue and classify digital games based on interaction design and validation through clustering techniques

L Cormio, T Agostinelli, M Mengoni - Multimedia Tools and Applications, 2024 - Springer
The digital games industry has grown exponentially due to the diversification of games and
the increasing multiplicity of the user target base. The market explosion and the great variety …

Automatic movie genre classification & emotion recognition via a BiProjection Multimodal Transformer

DA Moreno-Galván, R López-Santillán… - Information …, 2025 - Elsevier
Analyzing, manipulating, and comprehending data from multiple sources (eg, websites,
software applications, files, or databases) and of diverse modalities (eg, video, images …

Movie tag prediction: An extreme multi-label multi-modal transformer-based solution with explanation

M Guarascio, M Minici, FS Pisani… - Journal of Intelligent …, 2024 - Springer
Providing rich and accurate metadata for indexing media content is a crucial problem for all
the companies offering streaming entertainment services. These metadata are commonly …

Deep Learning Approach for Seamless Navigation in Multi-View Streaming Applications

TS Costa, P Viana, MT Andrade - IEEE Access, 2023 - ieeexplore.ieee.org
Quality of Experience (QoE) in multi-view streaming systems is known to be severely
affected by the latency associated with view-switching procedures. Anticipating the …

Exploration of Speech and Music Information for Movie Genre Classification

M Bhattacharjee, P Guha - ACM Transactions on Multimedia Computing …, 2024 - dl.acm.org
Movie genre prediction from trailers is mostly attempted in a multi-modal manner. However,
the characteristics of movie trailer audio indicate that this modality alone might be highly …

Multilevel profiling of situation and dialogue-based deep networks for movie genre classification using movie trailers

DK Vishwakarma, M Jindal, A Mittal… - arXiv preprint arXiv …, 2021 - arxiv.org
Automated movie genre classification has emerged as an active and essential area of
research and exploration. Short duration movie trailers provide useful insights about the …

Learning and explanation of extreme multi-label deep classification models for media content

M Minici, FS Pisani, M Guarascio… - … on Methodologies for …, 2022 - Springer
Providing rich and accurate metadata for indexing media content is a crucial problem for all
the companies offering streaming entertainment services. These metadata are typically used …