Zero-shot object detection

T Wang, B Chen, Z Zhang, H Li, M Zhang - Computers and Electronics in …, 2022 - Elsevier

Many tasks in smart agriculture have further requirements for the autonomous navigation of
agricultural robots. Due to irreplaceable visual information and low-cost hardware costs …

被引用次数：147 相关文章所有 5 个版本

[PDF] acm.org

Few-shot object detection: A survey

S Antonelli, D Avola, L Cinque, D Crisostomi… - ACM Computing …, 2022 - dl.acm.org

Deep learning approaches have recently raised the bar in many fields, from Natural
Language Processing to Computer Vision, by leveraging large amounts of data. However …

被引用次数：87 相关文章所有 5 个版本

[PDF] arxiv.org

Simple open-vocabulary object detection

M Minderer, A Gritsenko, A Stone, M Neumann… - … on Computer Vision, 2022 - Springer

Combining simple architectures with large-scale pre-training has led to massive
improvements in image classification. For object detection, pre-training and scaling …

被引用次数：382 相关文章所有 10 个版本

[PDF] neurips.cc

Glipv2: Unifying localization and vision-language understanding

H Zhang, P Zhang, X Hu, YC Chen… - Advances in …, 2022 - proceedings.neurips.cc

We present GLIPv2, a grounded VL understanding model, that serves both localization tasks
(eg, object detection, instance segmentation) and Vision-Language (VL) understanding …

被引用次数：253 相关文章所有 4 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：143 相关文章所有 6 个版本

[PDF] arxiv.org

Detecting twenty-thousand classes using image-level supervision

X Zhou, R Girdhar, A Joulin, P Krähenbühl… - European Conference on …, 2022 - Springer

Current object detectors are limited in vocabulary size due to the small scale of detection
datasets. Image classifiers, on the other hand, reason about much larger vocabularies, as …

被引用次数：524 相关文章所有 8 个版本

[PDF] thecvf.com

Grounded language-image pre-training

LH Li, P Zhang, H Zhang, J Yang, C Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper presents a grounded language-image pre-training (GLIP) model for learning
object-level, language-aware, and semantic-rich visual representations. GLIP unifies object …

被引用次数：903 相关文章所有 8 个版本

[PDF] thecvf.com

Regionclip: Region-based language-image pretraining

Y Zhong, J Yang, P Zhang, C Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

Contrastive language-image pretraining (CLIP) using image-text pairs has achieved
impressive results on image classification in both zero-shot and transfer learning settings …

被引用次数：477 相关文章所有 6 个版本

Florence: A new foundation model for computer vision

L Yuan, D Chen, YL Chen, N Codella, X Dai… - arXiv preprint arXiv …, 2021 - arxiv.org

Automated visual understanding of our diverse and open world demands computer vision
models to generalize well with minimal customization for specific tasks, similar to human …

被引用次数：820 相关文章所有 2 个版本

[PDF] thecvf.com

Learning to prompt for open-vocabulary object detection with vision-language model

Y Du, F Wei, Z Zhang, M Shi… - Proceedings of the …, 2022 - openaccess.thecvf.com

Recently, vision-language pre-training shows great potential in open-vocabulary object
detection, where detectors trained on base classes are devised for detecting new classes …

被引用次数：292 相关文章所有 10 个版本

Applications of machine vision in agricultural robot navigation: A review

Few-shot object detection: A survey

Simple open-vocabulary object detection

Glipv2: Unifying localization and vision-language understanding

Multimodal foundation models: From specialists to general-purpose assistants

Detecting twenty-thousand classes using image-level supervision

Grounded language-image pre-training

Regionclip: Region-based language-image pretraining

Florence: A new foundation model for computer vision

Learning to prompt for open-vocabulary object detection with vision-language model

高级搜索

引用