Review of lightweight deep convolutional neural networks
F Chen, S Li, J Han, F Ren, Z Yang - Archives of Computational Methods …, 2024 - Springer
Lightweight deep convolutional neural networks (LDCNNs) are vital components of mobile
intelligence, particularly in mobile vision. Although various heavy networks with increasingly …
intelligence, particularly in mobile vision. Although various heavy networks with increasingly …
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
CY Wang, A Bochkovskiy… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Real-time object detection is one of the most important research topics in computer vision.
As new approaches regarding architecture optimization and training optimization are …
As new approaches regarding architecture optimization and training optimization are …
Repvit: Revisiting mobile cnn from vit perspective
Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …
MobileNetV4: Universal Models for the Mobile Ecosystem
We present the latest generation of MobileNets: MobileNetV4 (MNv4). They feature
universally-efficient architecture designs for mobile devices. We introduce the Universal …
universally-efficient architecture designs for mobile devices. We introduce the Universal …
Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …
Mobilevlm: A fast, reproducible and strong vision language assistant for mobile devices
We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted
to run on mobile devices. It is an amalgamation of a myriad of architectural designs and …
to run on mobile devices. It is an amalgamation of a myriad of architectural designs and …
Mobileclip: Fast image-text models through multi-modal reinforced training
Contrastive pre-training of image-text foundation models such as CLIP demonstrated
excellent zero-shot performance and improved robustness on a wide range of downstream …
excellent zero-shot performance and improved robustness on a wide range of downstream …
Clipping: Distilling clip-based models with a student base for video-language retrieval
Pre-training a vison-language model and then fine-tuning it on downstream tasks have
become a popular paradigm. However, pre-trained vison-language models with the …
become a popular paradigm. However, pre-trained vison-language models with the …
[HTML][HTML] From Near-Sensor to In-Sensor: A State-of-the-Art Review of Embedded AI Vision Systems
In modern cyber-physical systems, the integration of AI into vision pipelines is now a
standard practice for applications ranging from autonomous vehicles to mobile devices …
standard practice for applications ranging from autonomous vehicles to mobile devices …
Temporal dynamic quantization for diffusion models
Diffusion model has gained popularity in vision applications due to its remarkable
generative performance and versatility. However, its high storage and computation …
generative performance and versatility. However, its high storage and computation …