Vision-language models for vision tasks: A survey
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …
(DNNs) training, and they usually train a DNN for each single visual recognition task …
A survey on open-vocabulary detection and segmentation: Past, present, and future
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …
have made tremendous progress in deep learning era. Due to the expensive manual …
An incremental-self-training-guided semi-supervised broad learning system
The broad learning system (BLS) has recently been applied in numerous fields. However, it
is mainly a supervised learning system and thus not suitable for specific practical …
is mainly a supervised learning system and thus not suitable for specific practical …
Text-prompt Camouflaged Instance Segmentation with Graduated Camouflage Learning
Camouflaged instance segmentation (CIS) aims to detect and segment objects blending
with their surroundings. While existing CIS methods rely heavily on fully-supervised training …
with their surroundings. While existing CIS methods rely heavily on fully-supervised training …
Non-exemplar Domain Incremental Learning via Cross-Domain Concept Integration
Abstract Existing approaches to Domain Incremental Learning (DIL) address catastrophic
forgetting by storing and rehearsing exemplars from old domains. However, exemplar-based …
forgetting by storing and rehearsing exemplars from old domains. However, exemplar-based …
Proactive schemes: A survey of adversarial attacks for social good
Adversarial attacks in computer vision exploit the vulnerabilities of machine learning models
by introducing subtle perturbations to input data, often leading to incorrect predictions or …
by introducing subtle perturbations to input data, often leading to incorrect predictions or …
Semantically Enhanced Scene Captions with Physical and Weather Condition Changes
H Sakaino - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Abstract Vision-Language models (VLMs), ie, image-text pairs of CLIP, have boosted image-
based Deep Learning (DL). Moreover, Visual-Question-Answer (VQA) tools and open …
based Deep Learning (DL). Moreover, Visual-Question-Answer (VQA) tools and open …
PV-Cap: 3D Dynamic Scene Understanding Through Open Physics-based Vocabulary
H Sakaino, TN Phuong, VN Duy - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Recently large Vision Language (VL) models ie CLIP have demonstrated
impressive capabilities in training solely on internet-scale image-language pairs. Moreover …
impressive capabilities in training solely on internet-scale image-language pairs. Moreover …
Dynamic Texts From UAV Perspective Natural Images
H Sakaino - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Drone-based image processing offers valuable capabilities for surveillance, detection, and
tracking in vast areas, aiding in disaster search and rescue and monitoring artificial events …
tracking in vast areas, aiding in disaster search and rescue and monitoring artificial events …
Advancing Causal Intervention in Image Captioning With Causal Prompt
This article introduces a novel approach, called causal prompting network (CPNet), to
enhance the causal intervention in the context of image captioning. By leveraging visual …
enhance the causal intervention in the context of image captioning. By leveraging visual …