- 学术资源搜索

Review of lightweight deep convolutional neural networks

F Chen, S Li, J Han, F Ren, Z Yang - Archives of Computational Methods …, 2024 - Springer

Lightweight deep convolutional neural networks (LDCNNs) are vital components of mobile
intelligence, particularly in mobile vision. Although various heavy networks with increasingly …

被引用次数：22 相关文章

[PDF] thecvf.com

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

CY Wang, A Bochkovskiy… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Real-time object detection is one of the most important research topics in computer vision.
As new approaches regarding architecture optimization and training optimization are …

被引用次数：9005 相关文章所有 10 个版本

[PDF] thecvf.com

Repvit: Revisiting mobile cnn from vit perspective

A Wang, H Chen, Z Lin, J Han… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …

被引用次数：166 相关文章所有 4 个版本

[PDF] arxiv.org

MobileNetV4: Universal Models for the Mobile Ecosystem

D Qin, C Leichner, M Delakis, M Fornoni, S Luo… - … on Computer Vision, 2025 - Springer

We present the latest generation of MobileNets: MobileNetV4 (MNv4). They feature
universally-efficient architecture designs for mobile devices. We introduce the Universal …

被引用次数：72 相关文章所有 2 个版本

[PDF] thecvf.com

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

被引用次数：116 相关文章所有 5 个版本

[PDF] arxiv.org

Mobilevlm: A fast, reproducible and strong vision language assistant for mobile devices

X Chu, L Qiao, X Lin, S Xu, Y Yang, Y Hu, F Wei… - arXiv preprint arXiv …, 2023 - arxiv.org

We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted
to run on mobile devices. It is an amalgamation of a myriad of architectural designs and …

被引用次数：74 相关文章所有 2 个版本

[PDF] thecvf.com

Mobileclip: Fast image-text models through multi-modal reinforced training

PKA Vasu, H Pouransari, F Faghri… - Proceedings of the …, 2024 - openaccess.thecvf.com

Contrastive pre-training of image-text foundation models such as CLIP demonstrated
excellent zero-shot performance and improved robustness on a wide range of downstream …

被引用次数：27 相关文章所有 2 个版本

[PDF] thecvf.com

Clipping: Distilling clip-based models with a student base for video-language retrieval

R Pei, J Liu, W Li, B Shao, S Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Pre-training a vison-language model and then fine-tuning it on downstream tasks have
become a popular paradigm. However, pre-trained vison-language models with the …

被引用次数：30 相关文章所有 6 个版本

[HTML] mdpi.com

[HTML][HTML] From Near-Sensor to In-Sensor: A State-of-the-Art Review of Embedded AI Vision Systems

W Fabre, K Haroun, V Lorrain, M Lepecq, G Sicard - Sensors, 2024 - mdpi.com

In modern cyber-physical systems, the integration of AI into vision pipelines is now a
standard practice for applications ranging from autonomous vehicles to mobile devices …

被引用次数：2 相关文章所有 8 个版本

[PDF] neurips.cc

Temporal dynamic quantization for diffusion models

J So, J Lee, D Ahn, H Kim… - Advances in Neural …, 2024 - proceedings.neurips.cc

Diffusion model has gained popularity in vision applications due to its remarkable
generative performance and versatility. However, its high storage and computation …

被引用次数：37 相关文章所有 6 个版本