Remote sensing image scene classification: Benchmark and state of the art
Remote sensing image scene classification plays an important role in a wide range of
applications and hence has been receiving remarkable attention. During the past years …
applications and hence has been receiving remarkable attention. During the past years …
Efficient structure from motion for large-scale UAV images: A review and a comparison of SfM tools
Unmanned aerial vehicle (UAV) images have gained extensive attention in varying fields,
and the Structure from Motion (SfM) technique has become the gold standard for aerial …
and the Structure from Motion (SfM) technique has become the gold standard for aerial …
Dinov2: Learning robust visual features without supervision
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …
quantities of data have opened the way for similar foundation models in computer vision …
Diffusion art or digital forgery? investigating data replication in diffusion models
Cutting-edge diffusion models produce images with high quality and customizability,
enabling them to be used for commercial art and graphic design purposes. But do diffusion …
enabling them to be used for commercial art and graphic design purposes. But do diffusion …
Emerging properties in self-supervised vision transformers
In this paper, we question if self-supervised learning provides new properties to Vision
Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the …
Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the …
Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks
Neural network based computer vision systems are typically built on a backbone, a
pretrained or randomly initialized feature extractor. Several years ago, the default option was …
pretrained or randomly initialized feature extractor. Several years ago, the default option was …
Movienet: A holistic dataset for movie understanding
Recent years have seen remarkable advances in visual understanding. However, how to
understand a story-based long video with artistic styles, eg movie, remains challenging. In …
understand a story-based long video with artistic styles, eg movie, remains challenging. In …
Vision models are more robust and fair when pretrained on uncurated images without supervision
Discriminative self-supervised learning allows training models on any random group of
internet images, and possibly recover salient information that helps differentiate between the …
internet images, and possibly recover salient information that helps differentiate between the …
Are labels required for improving adversarial robustness?
Recent work has uncovered the interesting (and somewhat surprising) finding that training
models to be invariant to adversarial perturbations requires substantially larger datasets …
models to be invariant to adversarial perturbations requires substantially larger datasets …
Region-based convolutional networks for accurate object detection and segmentation
Object detection performance, as measured on the canonical PASCAL VOC Challenge
datasets, plateaued in the final years of the competition. The best-performing methods were …
datasets, plateaued in the final years of the competition. The best-performing methods were …