Yolo-based object detection models: A review and its applications

A Vijayakumar, S Vairavasundaram - Multimedia Tools and Applications, 2024 - Springer
In computer vision, object detection is the classical and most challenging problem to get
accurate results in detecting objects. With the significant advancement of deep learning …

Differential feature awareness network within antagonistic learning for infrared-visible object detection

R Zhang, L Li, Q Zhang, J Zhang, L Xu… - … on Circuits and …, 2023 - ieeexplore.ieee.org
The combination of infrared and visible videos aims to gather more comprehensive feature
information from multiple sources and reach superior results on various practical tasks, such …

Solve the puzzle of instance segmentation in videos: A weakly supervised framework with spatio-temporal collaboration

L Yan, Q Wang, S Ma, J Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Instance segmentation in videos, which aims to segment and track multiple objects in video
frames, has garnered a flurry of research attention in recent years. In this paper, we present …

[PDF][PDF] Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration For Video Captioning.

L Yan, C Han, Z Xu, D Liu, Q Wang - IJCAI, 2023 - ijcai.org
Fine-tuning large vision-language models is a challenging task. Prompt tuning approaches
have been introduced to learn fixed textual or visual prompts while freezing the pre-trained …

Holistic prototype attention network for few-shot video object segmentation

Y Tang, T Chen, X Jiang, Y Yao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of unseen
classes by resorting to a small set of support images that contain pixel-level object …

Dual-constraint coarse-to-fine network for camouflaged object detection

G Yue, H Xiao, H Xie, T Zhou, W Zhou… - … on Circuits and …, 2023 - ieeexplore.ieee.org
Camouflaged object detection (COD) is an important yet challenging task, with great
application values in industrial defect detection, medical care, etc. The challenges mainly …

Feature fusion Vision Transformers using MLP-Mixer for enhanced deepfake detection

E Essa - Neurocomputing, 2024 - Elsevier
Deepfake technology, utilizing deep learning and computer vision, presents significant
security threats by generating highly realistic synthetic media, such as images and videos. In …

[HTML][HTML] Ov-vg: A benchmark for open-vocabulary visual grounding

C Wang, W Feng, X Li, G Cheng, S Lyu, B Liu, L Chen… - Neurocomputing, 2024 - Elsevier
Open-vocabulary learning has emerged as a cutting-edge research area, particularly in light
of the widespread adoption of vision-based foundational models. Its primary objective is to …

Semantic-aware Contrastive Learning with Proposal Suppression for Video Semantic Role Grounding

M Liu, D Zhou, J Guo, X Luo, Z Gao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Video semantic role grounding has gained substantial interest from both the academic and
industrial communities. While existing methods have demonstrated considerable …

Cml-mots: Collaborative multi-task learning for multi-object tracking and segmentation

Y Cui, C Han, D Liu - arXiv preprint arXiv:2311.00987, 2023 - arxiv.org
The advancement of computer vision has pushed visual analysis tasks from still images to
the video domain. In recent years, video instance segmentation, which aims to track and …