- 学术资源搜索

From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

被引用次数：290 相关文章所有 11 个版本

[PDF] arxiv.org

A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org

Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

被引用次数：861 相关文章所有 8 个版本

[PDF] thecvf.com

Generalizing face forgery detection with high-frequency features

Y Luo, Y Zhang, J Yan, W Liu - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Current face forgery detection methods achieve high accuracy under the within-database
scenario where training and testing forgeries are synthesized by the same algorithm …

被引用次数：306 相关文章所有 6 个版本

[PDF] thecvf.com

Meshed-memory transformer for image captioning

M Cornia, M Stefanini, L Baraldi… - Proceedings of the …, 2020 - openaccess.thecvf.com

Transformer-based architectures represent the state of the art in sequence modeling tasks
like machine translation and language understanding. Their applicability to multi-modal …

被引用次数：998 相关文章所有 13 个版本

[PDF] neurips.cc

Cross attention network for few-shot classification

R Hou, H Chang, B Ma, S Shan… - Advances in neural …, 2019 - proceedings.neurips.cc

Few-shot classification aims to recognize unlabeled samples from unseen classes given
only few labeled samples. The unseen classes and low-data problem make few-shot …

被引用次数：684 相关文章所有 12 个版本

[PDF] ieee.org

Classification of remote sensing images using EfficientNet-B3 CNN model with attention

H Alhichri, AS Alswayed, Y Bazi, N Ammour… - IEEE …, 2021 - ieeexplore.ieee.org

Scene classification is a highly useful task in Remote Sensing (RS) applications. Many
efforts have been made to improve the accuracy of RS scene classification. Scene …

被引用次数：215 相关文章所有 4 个版本

[PDF] researchgate.net

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer

In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

被引用次数：168 相关文章所有 8 个版本

[PDF] arxiv.org

Multi-scale self-guided attention for medical image segmentation

A Sinha, J Dolz - IEEE journal of biomedical and health …, 2020 - ieeexplore.ieee.org

Even though convolutional neural networks (CNNs) are driving progress in medical image
segmentation, standard models still have some drawbacks. First, the use of multi-scale …

被引用次数：481 相关文章所有 13 个版本

[PDF] thecvf.com

Entangled transformer for image captioning

G Li, L Zhu, P Liu, Y Yang - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com

In image captioning, the typical attention mechanisms are arduous to identify the equivalent
visual signals especially when predicting highly abstract words. This phenomenon is known …

被引用次数：378 相关文章所有 10 个版本

[PDF] thecvf.com

Bottom-up and top-down attention for image captioning and visual question answering

P Anderson, X He, C Buehler… - Proceedings of the …, 2018 - openaccess.thecvf.com

Top-down visual attention mechanisms have been used extensively in image captioning
and visual question answering (VQA) to enable deeper image understanding through fine …

被引用次数：5025 相关文章所有 16 个版本