Video captioning using global-local representation

A Vijayakumar, S Vairavasundaram - Multimedia Tools and Applications, 2024 - Springer

In computer vision, object detection is the classical and most challenging problem to get
accurate results in detecting objects. With the significant advancement of deep learning …

被引用次数：72 相关文章

Differential feature awareness network within antagonistic learning for infrared-visible object detection

R Zhang, L Li, Q Zhang, J Zhang, L Xu… - … on Circuits and …, 2023 - ieeexplore.ieee.org

The combination of infrared and visible videos aims to gather more comprehensive feature
information from multiple sources and reach superior results on various practical tasks, such …

被引用次数：67 相关文章

[PDF] arxiv.org

Solve the puzzle of instance segmentation in videos: A weakly supervised framework with spatio-temporal collaboration

L Yan, Q Wang, S Ma, J Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Instance segmentation in videos, which aims to segment and track multiple objects in video
frames, has garnered a flurry of research attention in recent years. In this paper, we present …

被引用次数：59 相关文章所有 3 个版本

[PDF] ijcai.org

[PDF][PDF] Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration For Video Captioning.

L Yan, C Han, Z Xu, D Liu, Q Wang - IJCAI, 2023 - ijcai.org

Fine-tuning large vision-language models is a challenging task. Prompt tuning approaches
have been introduced to learn fixed textual or visual prompts while freezing the pre-trained …

被引用次数：40 相关文章所有 3 个版本

Holistic prototype attention network for few-shot video object segmentation

Y Tang, T Chen, X Jiang, Y Yao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of unseen
classes by resorting to a small set of support images that contain pixel-level object …

被引用次数：15 相关文章

[PDF] cardiff.ac.uk

Dual-constraint coarse-to-fine network for camouflaged object detection

G Yue, H Xiao, H Xie, T Zhou, W Zhou… - … on Circuits and …, 2023 - ieeexplore.ieee.org

Camouflaged object detection (COD) is an important yet challenging task, with great
application values in industrial defect detection, medical care, etc. The challenges mainly …

被引用次数：17 相关文章所有 2 个版本

Feature fusion Vision Transformers using MLP-Mixer for enhanced deepfake detection

E Essa - Neurocomputing, 2024 - Elsevier

Deepfake technology, utilizing deep learning and computer vision, presents significant
security threats by generating highly realistic synthetic media, such as images and videos. In …

被引用次数：3 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] Ov-vg: A benchmark for open-vocabulary visual grounding

C Wang, W Feng, X Li, G Cheng, S Lyu, B Liu, L Chen… - Neurocomputing, 2024 - Elsevier

Open-vocabulary learning has emerged as a cutting-edge research area, particularly in light
of the widespread adoption of vision-based foundational models. Its primary objective is to …

被引用次数：6 相关文章所有 3 个版本

Semantic-aware Contrastive Learning with Proposal Suppression for Video Semantic Role Grounding

M Liu, D Zhou, J Guo, X Luo, Z Gao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Video semantic role grounding has gained substantial interest from both the academic and
industrial communities. While existing methods have demonstrated considerable …

被引用次数：4 相关文章

[PDF] arxiv.org

Cml-mots: Collaborative multi-task learning for multi-object tracking and segmentation

Y Cui, C Han, D Liu - arXiv preprint arXiv:2311.00987, 2023 - arxiv.org

The advancement of computer vision has pushed visual analysis tasks from still images to
the video domain. In recent years, video instance segmentation, which aims to track and …

被引用次数：4 相关文章所有 2 个版本