Cross-modal non-linear guided attention and temporal coherence in multi-modal deep video models

Z Luo, S Ghosh, D Guillory, K Kato, T Darrell… - arXiv preprint arXiv …, 2022 - arxiv.org

Action in video usually involves the interaction of human with objects. Action labels are
typically composed of various combinations of verbs and nouns, but we may not have …

被引用次数：5 相关文章所有 5 个版本

[HTML] frontiersin.org

[HTML][HTML] DPED: Bio-inspired dual-pathway network for edge detection

Y Chen, C Lin, Y Qiao - Frontiers in Bioengineering and …, 2022 - frontiersin.org

As the basis of high-level visual tasks, edge detection is significant. Most of the encoder-
decoder edge detection methods used convolutional neural networks, such as VGG16 or …

被引用次数：4 相关文章所有 6 个版本

Taohighlight: Commodity-aware multi-modal video highlight detection in e-commerce

Z Guo, Z Zhao, W Jin, D Wang, R Liu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

In e-commerce, product related video is important content to introduce product
characteristics and attract consumers. Especially in the recommendation system of e …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Enhancing transformer for video understanding using gated multi-level attention and temporal adversarial training

S Sahu, P Goyal - arXiv preprint arXiv:2103.10043, 2021 - arxiv.org

The introduction of Transformer model has led to tremendous advancements in sequence
modeling, especially in text domain. However, the use of attention-based models for video …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Can't Fool Me: Adversarially Robust Transformer for Video Understanding

D Choudhary, P Goyal, S Sahu - arXiv preprint arXiv:2110.13950, 2021 - arxiv.org

Deep neural networks have been shown to perform poorly on adversarial examples. To
address this, several techniques have been proposed to increase robustness of a model for …

被引用次数：1 相关文章所有 3 个版本