Comparative analysis on cross-modal information retrieval: A review

P Kaur, HS Pannu, AK Malhi - Computer Science Review, 2021 - Elsevier
Human beings experience life through a spectrum of modes such as vision, taste, hearing,
smell, and touch. These multiple modes are integrated for information processing in our …

Multimodal conversational ai: A survey of datasets and approaches

A Sundar, L Heck - arXiv preprint arXiv:2205.06907, 2022 - arxiv.org
As humans, we experience the world with all our senses or modalities (sound, sight, touch,
smell, and taste). We use these modalities, particularly sight and touch, to convey and …

STAR++: Rethinking spatio-temporal cross attention transformer for video action recognition

D Ahn, S Kim, BC Ko - Applied Intelligence, 2023 - Springer
Video action recognition needs to model any differences by subdividing the spatio-temporal
features to distinguish various actions. We propose rethinking spatio-temporal cross …

Multi-modal multi-action video recognition

Z Shi, J Liang, Q Li, H Zheng, Z Gu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Multi-action video recognition is much more challenging due to the requirement to recognize
multiple actions co-occurring simultaneously or sequentially. Modeling multi-action relations …

Semantic image collection summarization with frequent subgraph mining

A Pasini, F Giobergia, E Pastor, E Baralis - IEEE Access, 2022 - ieeexplore.ieee.org
Applications such as providing a preview of personal albums (eg, Google Photos) or
suggesting thematic collections based on user interests (eg, Pinterest) require a …

Sentiment analysis of linguistic cues to assist medical image classification

P Kaur, AK Malhi, HS Pannu - Multimedia Tools and Applications, 2024 - Springer
Image classification is a challenging problem and often suffers from the bottleneck of visual
features. With the ever-growing availability of multimedia data with the help of the Internet …

Advancing Perception in Artificial Intelligence through Principles of Cognitive Science

P Agrawal, C Tan, H Rathore - arXiv preprint arXiv:2310.08803, 2023 - arxiv.org
Although artificial intelligence (AI) has achieved many feats at a rapid pace, there still exist
open problems and fundamental shortcomings related to performance and resource …

D2F: discriminative dense fusion of appearance and motion modalities for end-to-end video classification

L Wang, X Wang, A Hawbani, Y Xiong… - Multimedia Tools and …, 2022 - Springer
Recently, two-stream networks with multi-modality inputs have shown to be of vital
importance for state-of-the-art video understanding. Previous deep systems typically employ …

[PDF][PDF] Semantics-aware image understanding

A Pasini - 2021 - core.ac.uk
Deep learning models are characterized by high complexity and low interpretability, which
are the payload for obtaining precise results in difficult tasks such as image understanding …

Activity recognition in dark video based on both audio and video content

Y Zhang, X Zhen, L Shao, CGM Snoek - US Patent 11,960,576, 2024 - Google Patents
Videos captured in low light conditions can be processed in order to identify an activity being
performed in the video. The processing may use both the video and audio streams for …