Vision-language models for vision tasks: A survey

J Zhang, J Huang, S Jin, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

An incremental-self-training-guided semi-supervised broad learning system

J Guo, Z Liu, CLP Chen - IEEE Transactions on Neural …, 2024 - ieeexplore.ieee.org
The broad learning system (BLS) has recently been applied in numerous fields. However, it
is mainly a supervised learning system and thus not suitable for specific practical …

Text-prompt Camouflaged Instance Segmentation with Graduated Camouflage Learning

Z He, C Xia, S Qiao, J Li - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
Camouflaged instance segmentation (CIS) aims to detect and segment objects blending
with their surroundings. While existing CIS methods rely heavily on fully-supervised training …

Non-exemplar Domain Incremental Learning via Cross-Domain Concept Integration

Q Wang, Y He, S Dong, X Gao, S Wang… - European Conference on …, 2025 - Springer
Abstract Existing approaches to Domain Incremental Learning (DIL) address catastrophic
forgetting by storing and rehearsing exemplars from old domains. However, exemplar-based …

Proactive schemes: A survey of adversarial attacks for social good

V Asnani, X Yin, X Liu - arXiv preprint arXiv:2409.16491, 2024 - arxiv.org
Adversarial attacks in computer vision exploit the vulnerabilities of machine learning models
by introducing subtle perturbations to input data, often leading to incorrect predictions or …

Semantically Enhanced Scene Captions with Physical and Weather Condition Changes

H Sakaino - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Abstract Vision-Language models (VLMs), ie, image-text pairs of CLIP, have boosted image-
based Deep Learning (DL). Moreover, Visual-Question-Answer (VQA) tools and open …

PV-Cap: 3D Dynamic Scene Understanding Through Open Physics-based Vocabulary

H Sakaino, TN Phuong, VN Duy - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Recently large Vision Language (VL) models ie CLIP have demonstrated
impressive capabilities in training solely on internet-scale image-language pairs. Moreover …

Dynamic Texts From UAV Perspective Natural Images

H Sakaino - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Drone-based image processing offers valuable capabilities for surveillance, detection, and
tracking in vast areas, aiding in disaster search and rescue and monitoring artificial events …

Advancing Causal Intervention in Image Captioning With Causal Prompt

Y Yu, Y Kim, YM Ro - IEEE Transactions on Neural Networks …, 2024 - ieeexplore.ieee.org
This article introduces a novel approach, called causal prompting network (CPNet), to
enhance the causal intervention in the context of image captioning. By leveraging visual …