Yolo-based object detection models: A review and its applications
A Vijayakumar, S Vairavasundaram - Multimedia Tools and Applications, 2024 - Springer
In computer vision, object detection is the classical and most challenging problem to get
accurate results in detecting objects. With the significant advancement of deep learning …
accurate results in detecting objects. With the significant advancement of deep learning …
Feature fusion Vision Transformers using MLP-Mixer for enhanced deepfake detection
E Essa - Neurocomputing, 2024 - Elsevier
Deepfake technology, utilizing deep learning and computer vision, presents significant
security threats by generating highly realistic synthetic media, such as images and videos. In …
security threats by generating highly realistic synthetic media, such as images and videos. In …
[HTML][HTML] Ov-vg: A benchmark for open-vocabulary visual grounding
Open-vocabulary learning has emerged as a cutting-edge research area, particularly in light
of the widespread adoption of vision-based foundational models. Its primary objective is to …
of the widespread adoption of vision-based foundational models. Its primary objective is to …
Overcoming language priors in visual question answering with cumulative learning strategy
A Mao, F Chen, Z Ma, K Lin - Neurocomputing, 2024 - Elsevier
The performance of visual question answering (VQA) has witnessed great progress over the
last few years. However, many current VQA models tend to rely on superficial linguistic …
last few years. However, many current VQA models tend to rely on superficial linguistic …
D3C2-Net: Dual-Domain Deep Convolutional Coding Network for Compressive Sensing
By mapping iterative optimization algorithms into neural networks (NNs), deep unfolding
networks (DUNs) exhibit well-defined and interpretable structures and achieve remarkable …
networks (DUNs) exhibit well-defined and interpretable structures and achieve remarkable …
Enhancing Image Annotation with Object Tracking and Image Retrieval: A Systematic Review
R Fernandes, A Pessoa, M Salgado, A De Paiva… - IEEE …, 2024 - ieeexplore.ieee.org
Effective image and video annotation is a fundamental pillar in computer vision and artificial
intelligence, crucial for the development of accurate machine learning models. Object …
intelligence, crucial for the development of accurate machine learning models. Object …
Late better than early: A decision-level information fusion approach for RGB-Thermal crowd counting with illumination awareness
In this paper, we make the first research effort to address the RGB-Thermal (RGB-T) crowd
counting problem with the decision-level late fusion manner. Being different from the existing …
counting problem with the decision-level late fusion manner. Being different from the existing …
Relation-aware Multi-pass Comparison Deconfounded Network for Change Captioning
Change captioning aims to describe the semantic change between a pair of images with
natural language while remaining immune to viewpoint change. Based on the encoder …
natural language while remaining immune to viewpoint change. Based on the encoder …
Exploiting multi-transformer encoder with multiple-hypothesis aggregation via diffusion model for 3D human pose estimation
S Arthanari, JH Jeong, YH Joo - Multimedia Tools and Applications, 2024 - Springer
The transformer architecture has consistently achieved cutting-edge performance in the task
of 2D to 3D lifting human pose estimation. Despite advances in transformer-based methods …
of 2D to 3D lifting human pose estimation. Despite advances in transformer-based methods …
Deep Learning Techniques for Video Instance Segmentation: A Survey
Video instance segmentation, also known as multi-object tracking and segmentation, is an
emerging computer vision research area introduced in 2019, aiming at detecting …
emerging computer vision research area introduced in 2019, aiming at detecting …