- 学术资源搜索

FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery

X Sun, P Wang, Z Yan, F Xu, R Wang, W Diao… - ISPRS Journal of …, 2022 - Elsevier

With the rapid development of deep learning, many deep learning-based approaches have
made great achievements in object detection tasks. It is generally known that deep learning …

被引用次数：277 相关文章所有 6 个版本

[PDF] springer.com

Artificial intelligence in the creative industries: a review

N Anantrasirichai, D Bull - Artificial intelligence review, 2022 - Springer

This paper reviews the current state of the art in artificial intelligence (AI) technologies and
applications in the context of the creative industries. A brief background of AI, and …

被引用次数：371 相关文章所有 10 个版本

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

被引用次数：360 相关文章所有 8 个版本

[PDF] thecvf.com

Biformer: Vision transformer with bi-level routing attention

L Zhu, X Wang, Z Ke, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

As the core building block of vision transformers, attention is a powerful tool to capture long-
range dependency. However, such power comes at a cost: it incurs a huge computation …

被引用次数：362 相关文章所有 10 个版本

[PDF] thecvf.com

Diffusiondet: Diffusion model for object detection

S Chen, P Sun, Y Song, P Luo - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

We propose DiffusionDet, a new framework that formulates object detection as a denoising
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …

被引用次数：326 相关文章所有 5 个版本

[PDF] arxiv.org

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …

被引用次数：305 相关文章所有 5 个版本

[PDF] arxiv.org

Vision transformer adapter for dense predictions

Z Chen, Y Duan, W Wang, J He, T Lu, J Dai… - arXiv preprint arXiv …, 2022 - arxiv.org

This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike
recent visual transformers that introduce vision-specific inductive biases into their …

被引用次数：427 相关文章所有 3 个版本

[PDF] thecvf.com

Efficientvit: Memory efficient vision transformer with cascaded group attention

X Liu, H Peng, N Zheng, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision transformers have shown great success due to their high model capabilities.
However, their remarkable performance is accompanied by heavy computation costs, which …

被引用次数：139 相关文章所有 8 个版本

[PDF] thecvf.com

Scaling up your kernels to 31x31: Revisiting large kernel design in cnns

X Ding, X Zhang, J Han, G Ding - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …

被引用次数：778 相关文章所有 10 个版本

[PDF] openreview.net

Dino: Detr with improved denoising anchor boxes for end-to-end object detection

H Zhang, F Li, S Liu, L Zhang, H Su, J Zhu… - arXiv preprint arXiv …, 2022 - arxiv.org

We present DINO (\textbf {D} ETR with\textbf {I} mproved de\textbf {N} oising anch\textbf {O} r
boxes), a state-of-the-art end-to-end object detector.% in this paper. DINO improves over …

被引用次数：979 相关文章所有 3 个版本