Attention mechanisms in computer vision: A survey
Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …
this observation, attention mechanisms were introduced into computer vision with the aim of …
A review of deep learning for video captioning
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …
contributions from domains such as computer vision, natural language processing …
Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion
Image fusion integrates a series of images acquired from different sensors, eg, infrared and
visible, outputting an image with richer information than either one. Traditional and recent …
visible, outputting an image with richer information than either one. Traditional and recent …
Deep multi-view enhancement hashing for image retrieval
Hashing is an efficient method for nearest neighbor search in large-scale data space by
embedding high-dimensional feature descriptors into a similarity preserving Hamming …
embedding high-dimensional feature descriptors into a similarity preserving Hamming …
AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated Transformer for Multisentence Video Description
J Prudviraj, MI Reddy, C Vishnu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Generating multi-sentence descriptions for video is considered to be the most complex task
in computer vision and natural language understanding due to the intricate nature of video …
in computer vision and natural language understanding due to the intricate nature of video …
Unsupervised person re-identification via softened similarity learning
Person re-identification (re-ID) is an important topic in computer vision. This paper studies
the unsupervised setting of re-ID, which does not require any labeled information and thus is …
the unsupervised setting of re-ID, which does not require any labeled information and thus is …
Depth image denoising using nuclear norm and learning graph model
Depth image denoising is increasingly becoming the hot research topic nowadays, because
it reflects the three-dimensional scene and can be applied in various fields of computer …
it reflects the three-dimensional scene and can be applied in various fields of computer …
Precise no-reference image quality evaluation based on distortion identification
The difficulty of no-reference image quality assessment (NR IQA) often lies in the lack of
knowledge about the distortion in the image, which makes quality assessment blind and …
knowledge about the distortion in the image, which makes quality assessment blind and …
Tsp: Temporally-sensitive pretraining of video encoders for localization tasks
H Alwassel, S Giancola… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Due to the large memory footprint of untrimmed videos, current state-of-the-art video
localization methods operate atop precomputed video clip features. These features are …
localization methods operate atop precomputed video clip features. These features are …
Video super-resolution with temporal group attention
Video super-resolution, which aims at producing a high-resolution video from its
corresponding low-resolution version, has recently drawn increasing attention. In this work …
corresponding low-resolution version, has recently drawn increasing attention. In this work …