Disentangled action recognition with knowledge bases

Z Luo, S Ghosh, D Guillory, K Kato, T Darrell… - arXiv preprint arXiv …, 2022 - arxiv.org
Action in video usually involves the interaction of human with objects. Action labels are
typically composed of various combinations of verbs and nouns, but we may not have …

[HTML][HTML] DPED: Bio-inspired dual-pathway network for edge detection

Y Chen, C Lin, Y Qiao - Frontiers in Bioengineering and …, 2022 - frontiersin.org
As the basis of high-level visual tasks, edge detection is significant. Most of the encoder-
decoder edge detection methods used convolutional neural networks, such as VGG16 or …

Taohighlight: Commodity-aware multi-modal video highlight detection in e-commerce

Z Guo, Z Zhao, W Jin, D Wang, R Liu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In e-commerce, product related video is important content to introduce product
characteristics and attract consumers. Especially in the recommendation system of e …

Enhancing transformer for video understanding using gated multi-level attention and temporal adversarial training

S Sahu, P Goyal - arXiv preprint arXiv:2103.10043, 2021 - arxiv.org
The introduction of Transformer model has led to tremendous advancements in sequence
modeling, especially in text domain. However, the use of attention-based models for video …

Can't Fool Me: Adversarially Robust Transformer for Video Understanding

D Choudhary, P Goyal, S Sahu - arXiv preprint arXiv:2110.13950, 2021 - arxiv.org
Deep neural networks have been shown to perform poorly on adversarial examples. To
address this, several techniques have been proposed to increase robustness of a model for …