G-tad: Sub-graph localization for temporal action detection
Temporal action detection is a fundamental yet challenging task in video understanding.
Video context is a critical cue to effectively detect actions, but current works mainly focus on …
Video context is a critical cue to effectively detect actions, but current works mainly focus on …
Action-net: Multipath excitation for action recognition
Abstract Spatial-temporal, channel-wise, and motion patterns are three complementary and
crucial types of information for video action recognition. Conventional 2D CNNs are …
crucial types of information for video action recognition. Conventional 2D CNNs are …
Tam: Temporal adaptive module for video recognition
Video data is with complex temporal dynamics due to various factors such as camera
motion, speed variation, and different activities. To effectively capture this diverse motion …
motion, speed variation, and different activities. To effectively capture this diverse motion …
Meteornet: Deep learning on dynamic 3d point cloud sequences
Understanding dynamic 3D environment is crucial for robotic agents and many other
applications. We propose a novel neural network architecture called MeteorNet for learning …
applications. We propose a novel neural network architecture called MeteorNet for learning …
Unsupervised learning of accurate siamese tracking
Unsupervised learning has been popular in various computer vision tasks, including visual
object tracking. However, prior unsupervised tracking approaches rely heavily on spatial …
object tracking. However, prior unsupervised tracking approaches rely heavily on spatial …
Motionsqueeze: Neural motion feature learning for video understanding
Motion plays a crucial role in understanding videos and most state-of-the-art neural models
for video classification incorporate motion information typically using optical flows extracted …
for video classification incorporate motion information typically using optical flows extracted …
Stand-alone inter-frame attention in video models
Motion, as the uniqueness of a video, has been critical to the development of video
understanding models. Modern deep learning models leverage motion by either executing …
understanding models. Modern deep learning models leverage motion by either executing …
Causal contextual prediction for learned image compression
Over the past several years, we have witnessed impressive progress in the field of learned
image compression. Recent learned image codecs are commonly based on autoencoders …
image compression. Recent learned image codecs are commonly based on autoencoders …
Social adaptive module for weakly-supervised group activity recognition
This paper presents a new task named weakly-supervised group activity recognition (GAR)
which differs from conventional GAR tasks in that only video-level labels are available, yet …
which differs from conventional GAR tasks in that only video-level labels are available, yet …