STAT: Spatial-temporal attention mechanism for video captioning

MH Guo, TX Xu, JJ Liu, ZN Liu, PT Jiang, TJ Mu… - Computational visual …, 2022 - Springer

Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …

被引用次数：1789 相关文章所有 8 个版本

[PDF] arxiv.org

A review of deep learning for video captioning

M Abdar, M Kollati, S Kuraparthi… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …

被引用次数：21 相关文章所有 3 个版本

Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion

J Liu, X Fan, J Jiang, R Liu, Z Luo - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Image fusion integrates a series of images acquired from different sensors, eg, infrared and
visible, outputting an image with richer information than either one. Traditional and recent …

被引用次数：263 相关文章所有 2 个版本

[PDF] arxiv.org

Deep multi-view enhancement hashing for image retrieval

C Yan, B Gong, Y Wei, Y Gao - IEEE Transactions on Pattern …, 2020 - ieeexplore.ieee.org

Hashing is an efficient method for nearest neighbor search in large-scale data space by
embedding high-dimensional feature descriptors into a similarity preserving Hamming …

被引用次数：417 相关文章所有 7 个版本

AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated Transformer for Multisentence Video Description

J Prudviraj, MI Reddy, C Vishnu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Generating multi-sentence descriptions for video is considered to be the most complex task
in computer vision and natural language understanding due to the intricate nature of video …

被引用次数：90 相关文章所有 4 个版本

[PDF] thecvf.com

Unsupervised person re-identification via softened similarity learning

Y Lin, L Xie, Y Wu, C Yan… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

Person re-identification (re-ID) is an important topic in computer vision. This paper studies
the unsupervised setting of re-ID, which does not require any labeled information and thus is …

被引用次数：337 相关文章所有 8 个版本

[PDF] arxiv.org

Depth image denoising using nuclear norm and learning graph model

C Yan, Z Li, Y Zhang, Y Liu, X Ji, Y Zhang - ACM Transactions on …, 2020 - dl.acm.org

Depth image denoising is increasingly becoming the hot research topic nowadays, because
it reflects the three-dimensional scene and can be applied in various fields of computer …

被引用次数：223 相关文章所有 3 个版本

Precise no-reference image quality evaluation based on distortion identification

C Yan, T Teng, Y Liu, Y Zhang, H Wang… - ACM Transactions on …, 2021 - dl.acm.org

The difficulty of no-reference image quality assessment (NR IQA) often lies in the lack of
knowledge about the distortion in the image, which makes quality assessment blind and …

被引用次数：117 相关文章

[PDF] thecvf.com

Tsp: Temporally-sensitive pretraining of video encoders for localization tasks

H Alwassel, S Giancola… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Due to the large memory footprint of untrimmed videos, current state-of-the-art video
localization methods operate atop precomputed video clip features. These features are …

被引用次数：150 相关文章所有 10 个版本

[PDF] thecvf.com

Video super-resolution with temporal group attention

T Isobe, S Li, X Jia, S Yuan… - Proceedings of the …, 2020 - openaccess.thecvf.com

Video super-resolution, which aims at producing a high-resolution video from its
corresponding low-resolution version, has recently drawn increasing attention. In this work …

被引用次数：212 相关文章所有 8 个版本