Convolutional neural network: a review of models, methodologies and applications to object detection
A Dhillon, GK Verma - Progress in Artificial Intelligence, 2020 - Springer
Deep learning has developed as an effective machine learning method that takes in
numerous layers of features or representation of the data and provides state-of-the-art …
numerous layers of features or representation of the data and provides state-of-the-art …
Neural machine translation: A review
F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org
The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …
natural language into another, has experienced a major paradigm shift in recent years …
Ai choreographer: Music conditioned 3d dance generation with aist++
We present AIST++, a new multi-modal dataset of 3D dance motion and music, along with
FACT, a Full-Attention Cross-modal Transformer network for generating 3D dance motion …
FACT, a Full-Attention Cross-modal Transformer network for generating 3D dance motion …
X-pool: Cross-modal language-video attention for text-video retrieval
In text-video retrieval, the objective is to learn a cross-modal similarity function between a
text and a video that ranks relevant text-video pairs higher than irrelevant pairs. However …
text and a video that ranks relevant text-video pairs higher than irrelevant pairs. However …
End-to-end dense video captioning with parallel decoding
Dense video captioning aims to generate multiple associated captions with their temporal
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …
Tea: Temporal excitation and aggregation for action recognition
Temporal modeling is key for action recognition in videos. It normally considers both short-
range motions and long-range aggregations. In this paper, we propose a Temporal …
range motions and long-range aggregations. In this paper, we propose a Temporal …
Actbert: Learning global-local video-text representations
In this paper, we introduce ActBERT for self-supervised learning of joint video-text
representations from unlabeled data. First, we leverage global action information to catalyze …
representations from unlabeled data. First, we leverage global action information to catalyze …
An attentive survey of attention models
Attention Model has now become an important concept in neural networks that has been
researched within diverse application domains. This survey provides a structured and …
researched within diverse application domains. This survey provides a structured and …
Attention, please! A survey of neural attention models in deep learning
A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …
limited ability to process competing sources, attention mechanisms select, modulate, and …
Object relational graph with teacher-recommended learning for video captioning
Taking full advantage of the information from both vision and language is critical for the
video captioning task. Existing models lack adequate visual representation due to the …
video captioning task. Existing models lack adequate visual representation due to the …