Comparative analysis on cross-modal information retrieval: A review
Human beings experience life through a spectrum of modes such as vision, taste, hearing,
smell, and touch. These multiple modes are integrated for information processing in our …
smell, and touch. These multiple modes are integrated for information processing in our …
Multimodal conversational ai: A survey of datasets and approaches
As humans, we experience the world with all our senses or modalities (sound, sight, touch,
smell, and taste). We use these modalities, particularly sight and touch, to convey and …
smell, and taste). We use these modalities, particularly sight and touch, to convey and …
STAR++: Rethinking spatio-temporal cross attention transformer for video action recognition
Video action recognition needs to model any differences by subdividing the spatio-temporal
features to distinguish various actions. We propose rethinking spatio-temporal cross …
features to distinguish various actions. We propose rethinking spatio-temporal cross …
Multi-modal multi-action video recognition
Multi-action video recognition is much more challenging due to the requirement to recognize
multiple actions co-occurring simultaneously or sequentially. Modeling multi-action relations …
multiple actions co-occurring simultaneously or sequentially. Modeling multi-action relations …
Semantic image collection summarization with frequent subgraph mining
Applications such as providing a preview of personal albums (eg, Google Photos) or
suggesting thematic collections based on user interests (eg, Pinterest) require a …
suggesting thematic collections based on user interests (eg, Pinterest) require a …
Sentiment analysis of linguistic cues to assist medical image classification
Image classification is a challenging problem and often suffers from the bottleneck of visual
features. With the ever-growing availability of multimedia data with the help of the Internet …
features. With the ever-growing availability of multimedia data with the help of the Internet …
Advancing Perception in Artificial Intelligence through Principles of Cognitive Science
Although artificial intelligence (AI) has achieved many feats at a rapid pace, there still exist
open problems and fundamental shortcomings related to performance and resource …
open problems and fundamental shortcomings related to performance and resource …
D2F: discriminative dense fusion of appearance and motion modalities for end-to-end video classification
Recently, two-stream networks with multi-modality inputs have shown to be of vital
importance for state-of-the-art video understanding. Previous deep systems typically employ …
importance for state-of-the-art video understanding. Previous deep systems typically employ …
[PDF][PDF] Semantics-aware image understanding
A Pasini - 2021 - core.ac.uk
Deep learning models are characterized by high complexity and low interpretability, which
are the payload for obtaining precise results in difficult tasks such as image understanding …
are the payload for obtaining precise results in difficult tasks such as image understanding …
Activity recognition in dark video based on both audio and video content
Videos captured in low light conditions can be processed in order to identify an activity being
performed in the video. The processing may use both the video and audio streams for …
performed in the video. The processing may use both the video and audio streams for …