Two-stream transformer architecture for long video understanding
E Fish, J Weinbren, A Gilbert - arXiv preprint arXiv:2208.01753, 2022 - arxiv.org
Pure vision transformer architectures are highly effective for short video classification and
action recognition tasks. However, due to the quadratic complexity of self attention and lack …
action recognition tasks. However, due to the quadratic complexity of self attention and lack …
Hierarchical few-shot learning based on coarse-and fine-grained relation network
Few-shot learning plays an important role in the field of machine learning. Many existing
methods based on relation network achieve satisfactory results. However, these methods …
methods based on relation network achieve satisfactory results. However, these methods …
Incorporating domain knowledge graph into multimodal movie genre classification with self-supervised attention and contrastive learning
Multimodal movie genre classification has always been regarded as a demanding multi-
label classification task due to the diversity of multimodal data such as posters, plot …
label classification task due to the diversity of multimodal data such as posters, plot …
A unified framework to catalogue and classify digital games based on interaction design and validation through clustering techniques
The digital games industry has grown exponentially due to the diversification of games and
the increasing multiplicity of the user target base. The market explosion and the great variety …
the increasing multiplicity of the user target base. The market explosion and the great variety …
Automatic movie genre classification & emotion recognition via a BiProjection Multimodal Transformer
DA Moreno-Galván, R López-Santillán… - Information …, 2025 - Elsevier
Analyzing, manipulating, and comprehending data from multiple sources (eg, websites,
software applications, files, or databases) and of diverse modalities (eg, video, images …
software applications, files, or databases) and of diverse modalities (eg, video, images …
Movie tag prediction: An extreme multi-label multi-modal transformer-based solution with explanation
Providing rich and accurate metadata for indexing media content is a crucial problem for all
the companies offering streaming entertainment services. These metadata are commonly …
the companies offering streaming entertainment services. These metadata are commonly …
Deep Learning Approach for Seamless Navigation in Multi-View Streaming Applications
Quality of Experience (QoE) in multi-view streaming systems is known to be severely
affected by the latency associated with view-switching procedures. Anticipating the …
affected by the latency associated with view-switching procedures. Anticipating the …
Exploration of Speech and Music Information for Movie Genre Classification
M Bhattacharjee, P Guha - ACM Transactions on Multimedia Computing …, 2024 - dl.acm.org
Movie genre prediction from trailers is mostly attempted in a multi-modal manner. However,
the characteristics of movie trailer audio indicate that this modality alone might be highly …
the characteristics of movie trailer audio indicate that this modality alone might be highly …
Multilevel profiling of situation and dialogue-based deep networks for movie genre classification using movie trailers
DK Vishwakarma, M Jindal, A Mittal… - arXiv preprint arXiv …, 2021 - arxiv.org
Automated movie genre classification has emerged as an active and essential area of
research and exploration. Short duration movie trailers provide useful insights about the …
research and exploration. Short duration movie trailers provide useful insights about the …
Learning and explanation of extreme multi-label deep classification models for media content
Providing rich and accurate metadata for indexing media content is a crucial problem for all
the companies offering streaming entertainment services. These metadata are typically used …
the companies offering streaming entertainment services. These metadata are typically used …