Yolo-based object detection models: A review and its applications

A Vijayakumar, S Vairavasundaram - Multimedia Tools and Applications, 2024 - Springer
In computer vision, object detection is the classical and most challenging problem to get
accurate results in detecting objects. With the significant advancement of deep learning …

Feature fusion Vision Transformers using MLP-Mixer for enhanced deepfake detection

E Essa - Neurocomputing, 2024 - Elsevier
Deepfake technology, utilizing deep learning and computer vision, presents significant
security threats by generating highly realistic synthetic media, such as images and videos. In …

[HTML][HTML] Ov-vg: A benchmark for open-vocabulary visual grounding

C Wang, W Feng, X Li, G Cheng, S Lyu, B Liu, L Chen… - Neurocomputing, 2024 - Elsevier
Open-vocabulary learning has emerged as a cutting-edge research area, particularly in light
of the widespread adoption of vision-based foundational models. Its primary objective is to …

Overcoming language priors in visual question answering with cumulative learning strategy

A Mao, F Chen, Z Ma, K Lin - Neurocomputing, 2024 - Elsevier
The performance of visual question answering (VQA) has witnessed great progress over the
last few years. However, many current VQA models tend to rely on superficial linguistic …

D3C2-Net: Dual-Domain Deep Convolutional Coding Network for Compressive Sensing

W Li, B Chen, S Liu, S Zhao, B Du… - … on Circuits and …, 2024 - ieeexplore.ieee.org
By mapping iterative optimization algorithms into neural networks (NNs), deep unfolding
networks (DUNs) exhibit well-defined and interpretable structures and achieve remarkable …

Enhancing Image Annotation with Object Tracking and Image Retrieval: A Systematic Review

R Fernandes, A Pessoa, M Salgado, A De Paiva… - IEEE …, 2024 - ieeexplore.ieee.org
Effective image and video annotation is a fundamental pillar in computer vision and artificial
intelligence, crucial for the development of accurate machine learning models. Object …

Late better than early: A decision-level information fusion approach for RGB-Thermal crowd counting with illumination awareness

J Cheng, C Feng, Y Xiao, Z Cao - Neurocomputing, 2024 - Elsevier
In this paper, we make the first research effort to address the RGB-Thermal (RGB-T) crowd
counting problem with the decision-level late fusion manner. Being different from the existing …

Relation-aware Multi-pass Comparison Deconfounded Network for Change Captioning

Z Lu, L Jin, Z Chen, C Tian, X Sun, X Li… - … on Circuits and …, 2024 - ieeexplore.ieee.org
Change captioning aims to describe the semantic change between a pair of images with
natural language while remaining immune to viewpoint change. Based on the encoder …

Exploiting multi-transformer encoder with multiple-hypothesis aggregation via diffusion model for 3D human pose estimation

S Arthanari, JH Jeong, YH Joo - Multimedia Tools and Applications, 2024 - Springer
The transformer architecture has consistently achieved cutting-edge performance in the task
of 2D to 3D lifting human pose estimation. Despite advances in transformer-based methods …

Deep Learning Techniques for Video Instance Segmentation: A Survey

C Xu, CT Li, Y Hu, CP Lim, D Creighton - arXiv preprint arXiv:2310.12393, 2023 - arxiv.org
Video instance segmentation, also known as multi-object tracking and segmentation, is an
emerging computer vision research area introduced in 2019, aiming at detecting …