Solve the puzzle of instance segmentation in videos: A weakly supervised framework with spatio-te...

A Vijayakumar, S Vairavasundaram - Multimedia Tools and Applications, 2024 - Springer

In computer vision, object detection is the classical and most challenging problem to get
accurate results in detecting objects. With the significant advancement of deep learning …

被引用次数：79 相关文章

Feature fusion Vision Transformers using MLP-Mixer for enhanced deepfake detection

E Essa - Neurocomputing, 2024 - Elsevier

Deepfake technology, utilizing deep learning and computer vision, presents significant
security threats by generating highly realistic synthetic media, such as images and videos. In …

被引用次数：7 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] Ov-vg: A benchmark for open-vocabulary visual grounding

C Wang, W Feng, X Li, G Cheng, S Lyu, B Liu, L Chen… - Neurocomputing, 2024 - Elsevier

Open-vocabulary learning has emerged as a cutting-edge research area, particularly in light
of the widespread adoption of vision-based foundational models. Its primary objective is to …

被引用次数：7 相关文章所有 3 个版本

[PDF] ssrn.com

Overcoming language priors in visual question answering with cumulative learning strategy

A Mao, F Chen, Z Ma, K Lin - Neurocomputing, 2024 - Elsevier

The performance of visual question answering (VQA) has witnessed great progress over the
last few years. However, many current VQA models tend to rely on superficial linguistic …

被引用次数：2 相关文章

[PDF] arxiv.org

D³C²-Net: Dual-Domain Deep Convolutional Coding Network for Compressive Sensing

W Li, B Chen, S Liu, S Zhao, B Du… - … on Circuits and …, 2024 - ieeexplore.ieee.org

By mapping iterative optimization algorithms into neural networks (NNs), deep unfolding
networks (DUNs) exhibit well-defined and interpretable structures and achieve remarkable …

被引用次数：17 相关文章所有 4 个版本

[PDF] ieee.org

Enhancing Image Annotation with Object Tracking and Image Retrieval: A Systematic Review

R Fernandes, A Pessoa, M Salgado, A De Paiva… - IEEE …, 2024 - ieeexplore.ieee.org

Effective image and video annotation is a fundamental pillar in computer vision and artificial
intelligence, crucial for the development of accurate machine learning models. Object …

被引用次数：4 相关文章

[PDF] ssrn.com

Late better than early: A decision-level information fusion approach for RGB-Thermal crowd counting with illumination awareness

J Cheng, C Feng, Y Xiao, Z Cao - Neurocomputing, 2024 - Elsevier

In this paper, we make the first research effort to address the RGB-Thermal (RGB-T) crowd
counting problem with the decision-level late fusion manner. Being different from the existing …

被引用次数：2 相关文章所有 2 个版本

Relation-aware Multi-pass Comparison Deconfounded Network for Change Captioning

Z Lu, L Jin, Z Chen, C Tian, X Sun, X Li… - … on Circuits and …, 2024 - ieeexplore.ieee.org

Change captioning aims to describe the semantic change between a pair of images with
natural language while remaining immune to viewpoint change. Based on the encoder …

被引用次数：1 相关文章

Exploiting multi-transformer encoder with multiple-hypothesis aggregation via diffusion model for 3D human pose estimation

S Arthanari, JH Jeong, YH Joo - Multimedia Tools and Applications, 2024 - Springer

The transformer architecture has consistently achieved cutting-edge performance in the task
of 2D to 3D lifting human pose estimation. Despite advances in transformer-based methods …

被引用次数：1 相关文章

[PDF] arxiv.org

Deep Learning Techniques for Video Instance Segmentation: A Survey

C Xu, CT Li, Y Hu, CP Lim, D Creighton - arXiv preprint arXiv:2310.12393, 2023 - arxiv.org

Video instance segmentation, also known as multi-object tracking and segmentation, is an
emerging computer vision research area introduced in 2019, aiming at detecting …

被引用次数：1 相关文章所有 2 个版本