Recognize anything: A strong image tagging model

Y Zhang, X Huang, J Ma, Z Li, Z Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present the Recognize Anything Model (RAM): a strong foundation model for
image tagging. RAM makes a substantial step for foundation models in computer vision …

Tag2text: Guiding vision-language model via image tagging

X Huang, Y Zhang, J Ma, W Tian, R Feng… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper presents Tag2Text, a vision language pre-training (VLP) framework, which
introduces image tagging into vision-language models to guide the learning of visual …

CNN and transformer framework for insect pest classification

Y Peng, Y Wang - Ecological Informatics, 2022 - Elsevier
Insect pests pose a significant and increasing threat to agricultural production worldwide.
However, most existing recognition methods are built upon well-known convolutional neural …

Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge

G Holste, Y Zhou, S Wang, A Jaiswal, M Lin… - Medical Image …, 2024 - Elsevier
Many real-world image recognition problems, such as diagnostic medical imaging exams,
are “long-tailed”–there are a few common findings followed by many more relatively rare …

Learning to generate semantic layouts for higher text-image correspondence in text-to-image synthesis

M Park, J Yun, S Choi, J Choo - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Existing text-to-image generation approaches have set high standards for photorealism and
text-image correspondence, largely benefiting from web-scale text-image datasets, which …

Multi-label classification with partial annotations using class-aware selective loss

E Ben-Baruch, T Ridnik, I Friedman… - Proceedings of the …, 2022 - openaccess.thecvf.com
Large-scale multi-label classification datasets are commonly, and perhaps inevitably,
partially annotated. That is, only a small subset of labels are annotated per sample. Different …

[HTML][HTML] Flowtransformer: A transformer framework for flow-based network intrusion detection systems

LD Manocchio, S Layeghy, WW Lo… - Expert Systems with …, 2024 - Elsevier
This paper presents the FlowTransformer framework, a novel approach for implementing
transformer-based Network Intrusion Detection Systems (NIDSs). FlowTransformer …

Obj2seq: Formatting objects as sequences with class prompt for visual tasks

Z Chen, Y Zhu, Z Li, F Yang, W Li… - Advances in …, 2022 - proceedings.neurips.cc
Visual tasks vary a lot in their output formats and concerned contents, therefore it is hard to
process them with an identical structure. One main obstacle lies in the high-dimensional …

Label-aware global consistency for multi-label learning with single positive labels

MK Xie, J Xiao, SJ Huang - Advances in Neural Information …, 2022 - proceedings.neurips.cc
In single positive multi-label learning (SPML), only one of multiple positive labels is
observed for each instance. The previous work trains the model by simply treating …

Prompt stealing attacks against text-to-image generation models

X Shen, Y Qu, M Backes, Y Zhang - arXiv preprint arXiv:2302.09923, 2023 - arxiv.org
Text-to-Image generation models have revolutionized the artwork design process and
enabled anyone to create high-quality images by entering text descriptions called prompts …