Video captioning by adversarial LSTM

YJ Cao, LL Jia, YX Chen, N Lin, C Yang, B Zhang… - IEEE …, 2018 - ieeexplore.ieee.org

The appearance of generative adversarial networks (GAN) provides a new approach and
framework for computer vision. Compared with traditional machine learning algorithms, GAN …

被引用次数：203 相关文章所有 6 个版本

[PDF] arxiv.org

A survey on video moment localization

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org

Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

被引用次数：31 相关文章所有 4 个版本

EEG emotion recognition using fusion model of graph convolutional neural networks and LSTM

Y Yin, X Zheng, B Hu, Y Zhang, X Cui - Applied Soft Computing, 2021 - Elsevier

In recent years, graph convolutional neural networks have become research focus and
inspired new ideas for emotion recognition based on EEG. Deep learning has been widely …

被引用次数：373 相关文章

[PDF] ieee.org

Deep multimodal representation learning: A survey

W Guo, J Wang, S Wang - Ieee Access, 2019 - ieeexplore.ieee.org

Multimodal representation learning, which aims to narrow the heterogeneity gap among
different modalities, plays an indispensable role in the utilization of ubiquitous multimodal …

被引用次数：530 相关文章所有 4 个版本

[PDF] arxiv.org

On data augmentation for GAN training

NT Tran, VH Tran, NB Nguyen… - … on Image Processing, 2021 - ieeexplore.ieee.org

Recent successes in Generative Adversarial Networks (GAN) have affirmed the importance
of using more data in GAN training. Yet it is expensive to collect data in many domains such …

被引用次数：329 相关文章所有 7 个版本

[PDF] google.com

STAT: Spatial-temporal attention mechanism for video captioning

C Yan, Y Tu, X Wang, Y Zhang, X Hao… - IEEE transactions on …, 2019 - ieeexplore.ieee.org

Video captioning refers to automatic generate natural language sentences, which
summarize the video contents. Inspired by the visual attention mechanism of human beings …

被引用次数：404 相关文章所有 4 个版本

[PDF] aaai.org

Beyond rnns: Positional self-attention with co-attention for video question answering

X Li, J Song, L Gao, X Liu, W Huang, X He… - Proceedings of the AAAI …, 2019 - ojs.aaai.org

Most of the recent progresses on visual question answering are based on recurrent neural
networks (RNNs) with attention. Despite the success, these models are often timeconsuming …

被引用次数：308 相关文章所有 10 个版本

Hierarchical LSTMs with adaptive attention for visual captioning

L Gao, X Li, J Song, HT Shen - IEEE transactions on pattern …, 2019 - ieeexplore.ieee.org

Recent progress has been made in using attention based encoder-decoder framework for
image and video captioning. Most existing decoders apply the attention mechanism to every …

被引用次数：285 相关文章所有 5 个版本

Exploiting subspace relation in semantic labels for cross-modal hashing

HT Shen, L Liu, Y Yang, X Xu, Z Huang… - … on Knowledge and …, 2020 - ieeexplore.ieee.org

Hashing methods have been extensively applied to efficient multimedia data indexing and
retrieval on account of the explosion of multimedia data. Cross-modal hashing usually …

被引用次数：203 相关文章所有 3 个版本

[PDF] thecvf.com

Object-aware aggregation with bidirectional temporal graph for video captioning

J Zhang, Y Peng - Proceedings of the IEEE/CVF conference …, 2019 - openaccess.thecvf.com

Video captioning aims to automatically generate natural language descriptions of video
content, which has drawn a lot of attention recent years. Generating accurate and fine …

被引用次数：216 相关文章所有 6 个版本