Convolutional neural network: a review of models, methodologies and applications to object detection

A Dhillon, GK Verma - Progress in Artificial Intelligence, 2020 - Springer
Deep learning has developed as an effective machine learning method that takes in
numerous layers of features or representation of the data and provides state-of-the-art …

Neural machine translation: A review

F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org
The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …

Ai choreographer: Music conditioned 3d dance generation with aist++

R Li, S Yang, DA Ross… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We present AIST++, a new multi-modal dataset of 3D dance motion and music, along with
FACT, a Full-Attention Cross-modal Transformer network for generating 3D dance motion …

X-pool: Cross-modal language-video attention for text-video retrieval

SK Gorti, N Vouitsis, J Ma, K Golestan… - Proceedings of the …, 2022 - openaccess.thecvf.com
In text-video retrieval, the objective is to learn a cross-modal similarity function between a
text and a video that ranks relevant text-video pairs higher than irrelevant pairs. However …

End-to-end dense video captioning with parallel decoding

T Wang, R Zhang, Z Lu, F Zheng… - Proceedings of the …, 2021 - openaccess.thecvf.com
Dense video captioning aims to generate multiple associated captions with their temporal
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …

Tea: Temporal excitation and aggregation for action recognition

Y Li, B Ji, X Shi, J Zhang, B Kang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Temporal modeling is key for action recognition in videos. It normally considers both short-
range motions and long-range aggregations. In this paper, we propose a Temporal …

Actbert: Learning global-local video-text representations

L Zhu, Y Yang - Proceedings of the IEEE/CVF conference …, 2020 - openaccess.thecvf.com
In this paper, we introduce ActBERT for self-supervised learning of joint video-text
representations from unlabeled data. First, we leverage global action information to catalyze …

An attentive survey of attention models

S Chaudhari, V Mithal, G Polatkan… - ACM Transactions on …, 2021 - dl.acm.org
Attention Model has now become an important concept in neural networks that has been
researched within diverse application domains. This survey provides a structured and …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

Object relational graph with teacher-recommended learning for video captioning

Z Zhang, Y Shi, C Yuan, B Li, P Wang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Taking full advantage of the information from both vision and language is critical for the
video captioning task. Existing models lack adequate visual representation due to the …