Remote sensing image scene classification: Benchmark and state of the art

G Cheng, J Han, X Lu - Proceedings of the IEEE, 2017 - ieeexplore.ieee.org
Remote sensing image scene classification plays an important role in a wide range of
applications and hence has been receiving remarkable attention. During the past years …

Efficient structure from motion for large-scale UAV images: A review and a comparison of SfM tools

S Jiang, C Jiang, W Jiang - ISPRS Journal of Photogrammetry and Remote …, 2020 - Elsevier
Unmanned aerial vehicle (UAV) images have gained extensive attention in varying fields,
and the Structure from Motion (SfM) technique has become the gold standard for aerial …

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arXiv preprint arXiv …, 2023 - arxiv.org
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

Diffusion art or digital forgery? investigating data replication in diffusion models

G Somepalli, V Singla, M Goldblum… - Proceedings of the …, 2023 - openaccess.thecvf.com
Cutting-edge diffusion models produce images with high quality and customizability,
enabling them to be used for commercial art and graphic design purposes. But do diffusion …

Emerging properties in self-supervised vision transformers

M Caron, H Touvron, I Misra, H Jégou… - Proceedings of the …, 2021 - openaccess.thecvf.com
In this paper, we question if self-supervised learning provides new properties to Vision
Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the …

Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks

M Goldblum, H Souri, R Ni, M Shu… - Advances in …, 2024 - proceedings.neurips.cc
Neural network based computer vision systems are typically built on a backbone, a
pretrained or randomly initialized feature extractor. Several years ago, the default option was …

Movienet: A holistic dataset for movie understanding

Q Huang, Y Xiong, A Rao, J Wang, D Lin - Computer Vision–ECCV 2020 …, 2020 - Springer
Recent years have seen remarkable advances in visual understanding. However, how to
understand a story-based long video with artistic styles, eg movie, remains challenging. In …

Vision models are more robust and fair when pretrained on uncurated images without supervision

P Goyal, Q Duval, I Seessel, M Caron, I Misra… - arXiv preprint arXiv …, 2022 - arxiv.org
Discriminative self-supervised learning allows training models on any random group of
internet images, and possibly recover salient information that helps differentiate between the …

Are labels required for improving adversarial robustness?

JB Alayrac, J Uesato, PS Huang… - Advances in …, 2019 - proceedings.neurips.cc
Recent work has uncovered the interesting (and somewhat surprising) finding that training
models to be invariant to adversarial perturbations requires substantially larger datasets …

Region-based convolutional networks for accurate object detection and segmentation

R Girshick, J Donahue, T Darrell… - IEEE transactions on …, 2015 - ieeexplore.ieee.org
Object detection performance, as measured on the canonical PASCAL VOC Challenge
datasets, plateaued in the final years of the competition. The best-performing methods were …