Cross-modal learning for multi-modal video categorization

P Kaur, HS Pannu, AK Malhi - Computer Science Review, 2021 - Elsevier

Human beings experience life through a spectrum of modes such as vision, taste, hearing,
smell, and touch. These multiple modes are integrated for information processing in our …

被引用次数：92 相关文章所有 3 个版本

[PDF] arxiv.org

Multimodal conversational ai: A survey of datasets and approaches

A Sundar, L Heck - arXiv preprint arXiv:2205.06907, 2022 - arxiv.org

As humans, we experience the world with all our senses or modalities (sound, sight, touch,
smell, and taste). We use these modalities, particularly sight and touch, to convey and …

被引用次数：28 相关文章所有 6 个版本

[PDF] researchgate.net

STAR++: Rethinking spatio-temporal cross attention transformer for video action recognition

D Ahn, S Kim, BC Ko - Applied Intelligence, 2023 - Springer

Video action recognition needs to model any differences by subdividing the spatio-temporal
features to distinguish various actions. We propose rethinking spatio-temporal cross …

被引用次数：4 相关文章所有 3 个版本

[PDF] thecvf.com

Multi-modal multi-action video recognition

Z Shi, J Liang, Q Li, H Zheng, Z Gu… - Proceedings of the …, 2021 - openaccess.thecvf.com

Multi-action video recognition is much more challenging due to the requirement to recognize
multiple actions co-occurring simultaneously or sequentially. Modeling multi-action relations …

被引用次数：11 相关文章所有 3 个版本

[PDF] ieee.org

Semantic image collection summarization with frequent subgraph mining

A Pasini, F Giobergia, E Pastor, E Baralis - IEEE Access, 2022 - ieeexplore.ieee.org

Applications such as providing a preview of personal albums (eg, Google Photos) or
suggesting thematic collections based on user interests (eg, Pinterest) require a …

被引用次数：7 相关文章所有 4 个版本

Sentiment analysis of linguistic cues to assist medical image classification

P Kaur, AK Malhi, HS Pannu - Multimedia Tools and Applications, 2024 - Springer

Image classification is a challenging problem and often suffers from the bottleneck of visual
features. With the ever-growing availability of multimedia data with the help of the Internet …

被引用次数：1 相关文章

[PDF] arxiv.org

Advancing Perception in Artificial Intelligence through Principles of Cognitive Science

P Agrawal, C Tan, H Rathore - arXiv preprint arXiv:2310.08803, 2023 - arxiv.org

Although artificial intelligence (AI) has achieved many feats at a rapid pace, there still exist
open problems and fundamental shortcomings related to performance and resource …

被引用次数：1 相关文章所有 2 个版本

[PDF] springer.com

D²F: discriminative dense fusion of appearance and motion modalities for end-to-end video classification

L Wang, X Wang, A Hawbani, Y Xiong… - Multimedia Tools and …, 2022 - Springer

Recently, two-stream networks with multi-modality inputs have shown to be of vital
importance for state-of-the-art video understanding. Previous deep systems typically employ …

被引用次数：2 相关文章所有 4 个版本

[PDF] core.ac.uk

[PDF][PDF] Semantics-aware image understanding

A Pasini - 2021 - core.ac.uk

Deep learning models are characterized by high complexity and low interpretability, which
are the payload for obtaining precise results in difficult tasks such as image understanding …

Activity recognition in dark video based on both audio and video content

Y Zhang, X Zhen, L Shao, CGM Snoek - US Patent 11,960,576, 2024 - Google Patents

Videos captured in low light conditions can be processed in order to identify an activity being
performed in the video. The processing may use both the video and audio streams for …