Temporal 3d convnets: New architecture and transfer learning for video classification

A Diba, M Fayyaz, V Sharma, AH Karami… - arXiv preprint arXiv …, 2017 - arxiv.org
… our transfer learning: 2D → 3D ConvNet performance with generic state-of-the-art 3D CNN
… weight initialization via transfer learning is possible for 3D ConvNet architecture, which can …

Learning spatiotemporal features with 3d convolutional networks

D Tran, L Bourdev, R Fergus… - Proceedings of the …, 2015 - openaccess.thecvf.com
… fully-connected layers which perform well on transfer learning tasks [47… propose to learn
spatio-temporal features using deep 3D ConvNet. We … for video classification. We also verify that …

Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification

S Xie, C Sun, J Huang, Z Tu… - Proceedings of the …, 2018 - openaccess.thecvf.com
… replacing 3D convolutions with spatial and temporal separable 3D convolutions, … video
classification datasets Next we conduct transfer learning experiments from Kinetics to other video

Convnet architecture search for spatiotemporal feature learning

D Tran, J Ray, Z Shou, SF Chang, M Paluri - arXiv preprint arXiv …, 2017 - arxiv.org
architecture search for video classification on UCF101. We showed that our observations are
useful for spatiotemporal feature learning … be matched to a temporal slice of a 3D filter. We …

[PDF][PDF] Rethinking spatiotemporal feature learning for video understanding

S Xie, C Sun, J Huang, Z Tu, K Murphy - arXiv preprint arXiv …, 2017 - chensun.me
… I3D-K, we retain 3D temporal convolutions at the lowest K layers … significantly on video
classification and action detection tasks. … transfer learning experiments from Kinetics to other video

Two-stream 3-d convnet fusion for action recognition in videos with arbitrary size and length

X Wang, L Gao, P Wang, X Sun… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
… , we decompose a video into spatial and temporal shots. By … 3D convNet, which can recognize
human actions in videos of … size and diversity for video classification. In terms of HMDB51 …

Spatio-temporal deformable 3d convnets with attention for action recognition

J Li, X Liu, M Zhang, D Wang - Pattern Recognition, 2020 - Elsevier
… motion information, we propose the temporal deformable 3D ConvNets based on the attention.
The temporal deformable convolution can learn both the temporal and the spatial offsets …

3D convolutional networks with multi-layer-pooling selection fusion for video classification

Z Hu, R Zhang, Y Qiu, M Zhao, Z Sun - Multimedia Tools and Applications, 2021 - Springer
… , for the video representation building based on 3D ConvNets, … spatial and temporal jittering
and different video sample rate. … to learn robust video representation for video classification. …

Would mega-scale datasets further enhance spatiotemporal 3D CNNs?

H Kataoka, T Wakamiya, K Hara, Y Satoh - arXiv preprint arXiv …, 2020 - arxiv.org
… Following a comprehensive study of transfer learning on ImageNet [… the-art video classification
performance in the present paper. … the spatial and temporal volume at each stacked block. …

3D-TDC: A 3D temporal dilation convolution framework for video action recognition

Y Ming, F Feng, C Li, JH Xue - Neurocomputing, 2021 - Elsevier
… into the convolution network at one time, resulting in a limited temporal3D temporal dilation
convolution (3D-TDC) framework, to extract spatio-temporal features of actions from videos. …