Pdan: Pyramid dilated attention network for action detection

R Dai, S Das, L Minciullo, L Garattoni… - Proceedings of the …, 2021 - openaccess.thecvf.com
Proceedings of the IEEE/CVF Winter Conference on Applications …, 2021openaccess.thecvf.com
Handling long and complex temporal information is an important factor for action detection
tasks. This challenge is further aggravated by densely distributed actions in untrimmed
videos. Previous action detection methods are failing in selecting the key temporal
information in videos of long length. To this end, we introduce the Dilated Attention Layer
(DAL). Compared to previous temporal convolution layer, DAL allocates attentional weights
to each feature in the kernel, which enables DAL to learn better local representation across …
Abstract
Handling long and complex temporal information is an important factor for action detection tasks. This challenge is further aggravated by densely distributed actions in untrimmed videos. Previous action detection methods are failing in selecting the key temporal information in videos of long length. To this end, we introduce the Dilated Attention Layer (DAL). Compared to previous temporal convolution layer, DAL allocates attentional weights to each feature in the kernel, which enables DAL to learn better local representation across time. Furthermore, DAL when accompanied by dilated kernels is able to learn a global representation of several minutes long videos which is crucial for the task of action detection. Finally, we introduce Pyramid Dilated Attention Network (PDAN) which is build upon DAL. With the help of DAL combining with dilation and residual links, PDAN can model short-term and long-term temporal relations simultaneously by focusing on local segments at the level of low and high temporal receptive fields. This property enables PDAN to handle complex temporal relations between different action instances in long untrimmed videos. To corroborate the effectiveness and robustness of our proposed method, we evaluate it on three densely annotated, multi-label datasets: MultiTHUMOS, Charades and an Inhouse dataset, outperforming the state-of-the-art results.
openaccess.thecvf.com
以上显示的是最相近的搜索结果。 查看全部搜索结果

Google学术搜索按钮

example.edu/paper.pdf
查找
获取 PDF 文件
引用
References