Ml-decoder: Scalable and versatile classification head

Y Zhang, X Huang, J Ma, Z Li, Z Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We present the Recognize Anything Model (RAM): a strong foundation model for
image tagging. RAM makes a substantial step for foundation models in computer vision …

被引用次数：96 相关文章所有 3 个版本

[PDF] arxiv.org

Tag2text: Guiding vision-language model via image tagging

X Huang, Y Zhang, J Ma, W Tian, R Feng… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents Tag2Text, a vision language pre-training (VLP) framework, which
introduces image tagging into vision-language models to guide the learning of visual …

被引用次数：38 相关文章所有 3 个版本

CNN and transformer framework for insect pest classification

Y Peng, Y Wang - Ecological Informatics, 2022 - Elsevier

Insect pests pose a significant and increasing threat to agricultural production worldwide.
However, most existing recognition methods are built upon well-known convolutional neural …

被引用次数：27 相关文章所有 3 个版本

[HTML] nih.gov

Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge

G Holste, Y Zhou, S Wang, A Jaiswal, M Lin… - Medical Image …, 2024 - Elsevier

Many real-world image recognition problems, such as diagnostic medical imaging exams,
are “long-tailed”–there are a few common findings followed by many more relatively rare …

被引用次数：1 相关文章所有 6 个版本

[PDF] thecvf.com

Learning to generate semantic layouts for higher text-image correspondence in text-to-image synthesis

M Park, J Yun, S Choi, J Choo - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Existing text-to-image generation approaches have set high standards for photorealism and
text-image correspondence, largely benefiting from web-scale text-image datasets, which …

被引用次数：4 相关文章所有 5 个版本

[PDF] thecvf.com

Multi-label classification with partial annotations using class-aware selective loss

E Ben-Baruch, T Ridnik, I Friedman… - Proceedings of the …, 2022 - openaccess.thecvf.com

Large-scale multi-label classification datasets are commonly, and perhaps inevitably,
partially annotated. That is, only a small subset of labels are annotated per sample. Different …

被引用次数：36 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] Flowtransformer: A transformer framework for flow-based network intrusion detection systems

LD Manocchio, S Layeghy, WW Lo… - Expert Systems with …, 2024 - Elsevier

This paper presents the FlowTransformer framework, a novel approach for implementing
transformer-based Network Intrusion Detection Systems (NIDSs). FlowTransformer …

被引用次数：11 相关文章所有 4 个版本

[PDF] neurips.cc

Obj2seq: Formatting objects as sequences with class prompt for visual tasks

Z Chen, Y Zhu, Z Li, F Yang, W Li… - Advances in …, 2022 - proceedings.neurips.cc

Visual tasks vary a lot in their output formats and concerned contents, therefore it is hard to
process them with an identical structure. One main obstacle lies in the high-dimensional …

被引用次数：18 相关文章所有 6 个版本

[PDF] neurips.cc

Label-aware global consistency for multi-label learning with single positive labels

MK Xie, J Xiao, SJ Huang - Advances in Neural Information …, 2022 - proceedings.neurips.cc

In single positive multi-label learning (SPML), only one of multiple positive labels is
observed for each instance. The previous work trains the model by simply treating …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Prompt stealing attacks against text-to-image generation models

X Shen, Y Qu, M Backes, Y Zhang - arXiv preprint arXiv:2302.09923, 2023 - arxiv.org

Text-to-Image generation models have revolutionized the artwork design process and
enabled anyone to create high-quality images by entering text descriptions called prompts …

被引用次数：16 相关文章所有 3 个版本