Apostol (Paul) Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan...

S Ravi, P Climent-Pérez, F Florez-Revuelta - Multimedia Tools and …, 2024 - Springer

This paper reviews the state of the art in visual privacy protection techniques, with particular
attention paid to techniques applicable to the field of Active and Assisted Living (AAL). A …

被引用次数：34 相关文章所有 6 个版本

[PDF] thecvf.com

Speednet: Learning the speediness in videos

S Benaim, A Ephrat, O Lang, I Mosseri… - Proceedings of the …, 2020 - openaccess.thecvf.com

We wish to automatically predict the" speediness" of moving objects in videos-whether they
move faster, at, or slower than their" natural" speed. The core component in our approach is …

被引用次数：281 相关文章所有 12 个版本

[PDF] thecvf.com

Self-supervised video transformer

K Ranasinghe, M Naseer, S Khan… - Proceedings of the …, 2022 - openaccess.thecvf.com

In this paper, we propose self-supervised training for video transformers using unlabeled
video data. From a given video, we create local and global spatiotemporal views with …

被引用次数：84 相关文章所有 10 个版本

[PDF] thecvf.com

Spoken moments: Learning joint audio-visual representations from video descriptions

M Monfort, SY Jin, A Liu, D Harwath… - Proceedings of the …, 2021 - openaccess.thecvf.com

When people observe events, they are able to abstract key information and build concise
summaries of what is happening. These summaries include contextual and semantic …

被引用次数：64 相关文章所有 8 个版本

[PDF] neurips.cc

How transferable are video representations based on synthetic data?

Y Kim, S Mishra, SY Jin, R Panda… - Advances in …, 2022 - proceedings.neurips.cc

Action recognition has improved dramatically with massive-scale video datasets. Yet, these
datasets are accompanied with issues related to curation cost, privacy, ethics, bias, and …

被引用次数：19 相关文章所有 9 个版本

[PDF] thecvf.com

Efficient video classification using fewer frames

S Bhardwaj, M Srinivasan… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Recently, there has been a lot of interest in building compact models for video classification
which have a small memory footprint (< 1 GB). While these models are compact, they …

被引用次数：104 相关文章所有 6 个版本

[PDF] thecvf.com

Masking modalities for cross-modal video retrieval

V Gabeur, A Nagrani, C Sun… - Proceedings of the …, 2022 - openaccess.thecvf.com

Pre-training on large scale unlabelled datasets has shown impressive performance
improvements in the fields of computer vision and natural language processing. Given the …

被引用次数：37 相关文章所有 18 个版本

[PDF] thecvf.com

Alignment-uniformity aware representation learning for zero-shot video classification

S Pu, K Zhao, M Zheng - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com

Most methods tackle zero-shot video classification by aligning visual-semantic
representations within seen classes, which limits generalization to unseen classes. To …

被引用次数：18 相关文章所有 7 个版本

[PDF] acm.org

Cross-modal generalization: Learning in low resource modalities via meta-alignment

PP Liang, P Wu, L Ziyin, LP Morency… - Proceedings of the 29th …, 2021 - dl.acm.org

How can we generalize to a new prediction task at test time when it also uses a new
modality as input? More importantly, how can we do this with as little annotated data as …

被引用次数：28 相关文章所有 4 个版本

[PDF] thecvf.com

Scalable and accurate self-supervised multimodal representation learning without aligned video and text data

V Lialin, S Rawls, D Chan, S Ghosh… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scaling up weakly-supervised datasets has shown to be highly effective in the image-text
domain and has contributed to most of the recent state-of-the-art computer vision and …

被引用次数：6 相关文章所有 6 个版本