Exploring plain vision transformer backbones for object detection
We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for
object detection. This design enables the original ViT architecture to be fine-tuned for object …
object detection. This design enables the original ViT architecture to be fine-tuned for object …
Exploring Plain Vision Transformer Backbones for Object Detection
Y Li, H Mao, R Girshick, K He - European Conference on Computer …, 2022 - dl.acm.org
We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for
object detection. This design enables the original ViT architecture to be fine-tuned for object …
object detection. This design enables the original ViT architecture to be fine-tuned for object …
Exploring Plain Vision Transformer Backbones for Object Detection
Y Li, H Mao, R Girshick, K He - arXiv preprint arXiv:2203.16527, 2022 - arxiv.org
We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for
object detection. This design enables the original ViT architecture to be fine-tuned for object …
object detection. This design enables the original ViT architecture to be fine-tuned for object …
[PDF][PDF] Exploring Plain Vision Transformer Backbones for Object Detection
YLH Mao, R Girshick, K He - ecva.net
We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for
object detection. This design enables the original ViT architecture to be fine-tuned for object …
object detection. This design enables the original ViT architecture to be fine-tuned for object …
Exploring Plain Vision Transformer Backbones for Object Detection
Y Li, H Mao, R Girshick, K He - arXiv e-prints, 2022 - ui.adsabs.harvard.edu
We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for
object detection. This design enables the original ViT architecture to be fine-tuned for object …
object detection. This design enables the original ViT architecture to be fine-tuned for object …
[PDF][PDF] Exploring Plain Vision Transformer Backbones for Object Detection
YLH Mao, R Girshick, K He - arxiv.org
We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for
object detection. This design enables the original ViT architecture to be fine-tuned for object …
object detection. This design enables the original ViT architecture to be fine-tuned for object …