Attention mechanisms in computer vision: A survey

MH Guo, TX Xu, JJ Liu, ZN Liu, PT Jiang, TJ Mu… - Computational visual …, 2022 - Springer
Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …

A review of deep learning for video captioning

M Abdar, M Kollati, S Kuraparthi… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …

Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion

J Liu, X Fan, J Jiang, R Liu, Z Luo - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Image fusion integrates a series of images acquired from different sensors, eg, infrared and
visible, outputting an image with richer information than either one. Traditional and recent …

Deep multi-view enhancement hashing for image retrieval

C Yan, B Gong, Y Wei, Y Gao - IEEE Transactions on Pattern …, 2020 - ieeexplore.ieee.org
Hashing is an efficient method for nearest neighbor search in large-scale data space by
embedding high-dimensional feature descriptors into a similarity preserving Hamming …

AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated Transformer for Multisentence Video Description

J Prudviraj, MI Reddy, C Vishnu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Generating multi-sentence descriptions for video is considered to be the most complex task
in computer vision and natural language understanding due to the intricate nature of video …

Unsupervised person re-identification via softened similarity learning

Y Lin, L Xie, Y Wu, C Yan… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Person re-identification (re-ID) is an important topic in computer vision. This paper studies
the unsupervised setting of re-ID, which does not require any labeled information and thus is …

Depth image denoising using nuclear norm and learning graph model

C Yan, Z Li, Y Zhang, Y Liu, X Ji, Y Zhang - ACM Transactions on …, 2020 - dl.acm.org
Depth image denoising is increasingly becoming the hot research topic nowadays, because
it reflects the three-dimensional scene and can be applied in various fields of computer …

Precise no-reference image quality evaluation based on distortion identification

C Yan, T Teng, Y Liu, Y Zhang, H Wang… - ACM Transactions on …, 2021 - dl.acm.org
The difficulty of no-reference image quality assessment (NR IQA) often lies in the lack of
knowledge about the distortion in the image, which makes quality assessment blind and …

Tsp: Temporally-sensitive pretraining of video encoders for localization tasks

H Alwassel, S Giancola… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Due to the large memory footprint of untrimmed videos, current state-of-the-art video
localization methods operate atop precomputed video clip features. These features are …

Video super-resolution with temporal group attention

T Isobe, S Li, X Jia, S Yuan… - Proceedings of the …, 2020 - openaccess.thecvf.com
Video super-resolution, which aims at producing a high-resolution video from its
corresponding low-resolution version, has recently drawn increasing attention. In this work …