Disentangled action recognition with knowledge bases
Action in video usually involves the interaction of human with objects. Action labels are
typically composed of various combinations of verbs and nouns, but we may not have …
typically composed of various combinations of verbs and nouns, but we may not have …
[HTML][HTML] DPED: Bio-inspired dual-pathway network for edge detection
Y Chen, C Lin, Y Qiao - Frontiers in Bioengineering and …, 2022 - frontiersin.org
As the basis of high-level visual tasks, edge detection is significant. Most of the encoder-
decoder edge detection methods used convolutional neural networks, such as VGG16 or …
decoder edge detection methods used convolutional neural networks, such as VGG16 or …
Taohighlight: Commodity-aware multi-modal video highlight detection in e-commerce
In e-commerce, product related video is important content to introduce product
characteristics and attract consumers. Especially in the recommendation system of e …
characteristics and attract consumers. Especially in the recommendation system of e …
Enhancing transformer for video understanding using gated multi-level attention and temporal adversarial training
The introduction of Transformer model has led to tremendous advancements in sequence
modeling, especially in text domain. However, the use of attention-based models for video …
modeling, especially in text domain. However, the use of attention-based models for video …
Can't Fool Me: Adversarially Robust Transformer for Video Understanding
Deep neural networks have been shown to perform poorly on adversarial examples. To
address this, several techniques have been proposed to increase robustness of a model for …
address this, several techniques have been proposed to increase robustness of a model for …