Video swin transformers for egocentric video understanding@ ego4d challenges 2022

Z Xue, Y Song, K Grauman… - Proceedings of the …, 2023 - openaccess.thecvf.com

Different video understanding tasks are typically treated in isolation, and even with distinct
types of curated data (eg, classifying sports in one dataset, tracking animals in another) …

被引用次数：13 相关文章所有 6 个版本

[PDF] arxiv.org

Egocentric video task translation@ ego4d challenge 2022

Z Xue, Y Song, K Grauman, L Torresani - arXiv preprint arXiv:2302.01891, 2023 - arxiv.org

This technical report describes the EgoTask Translation approach that explores relations
among a set of egocentric video tasks in the Ego4D challenge. To improve the primary task …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Efficient video representation learning via motion-aware token selection

S Hwang, J Yoon, Y Lee, SJ Hwang - arXiv preprint arXiv:2211.10636, 2022 - arxiv.org

Recently emerged Masked Video Modeling techniques demonstrated their potential by
significantly outperforming previous methods in self-supervised learning for video. However …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Masked Autoencoders for Egocentric Video Understanding@ Ego4D Challenge 2022

J Lei, S Ma, Z Ba, S Vemprala, A Kapoor… - arXiv preprint arXiv …, 2022 - arxiv.org

In this report, we present our approach and empirical results of applying masked
autoencoders in two egocentric video understanding tasks, namely, Object State Change …

EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens

S Hwang, J Yoon, Y Lee, SJ Hwang - openreview.net

Masked video autoencoder approaches have demonstrated their potential by significantly
outperforming previous self-supervised learning methods in video representation learning …

被引用次数：3 相关文章所有 2 个版本