A comprehensive review on 3D object detection and 6D pose estimation with deep learning

S Hoque, MY Arafat, S Xu, A Maiti, Y Wei - IEEE Access, 2021 - ieeexplore.ieee.org
Nowadays, computer vision with 3D (dimension) object detection and 6D (degree of
freedom) pose assumptions are widely discussed and studied in the field. In the 3D object …

Prototypical matching networks for video object segmentation

F Lin, Z Qiu, C Liu, T Yao, H Xie… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Semi-supervised video object segmentation is the task of segmenting the target in
sequential frames given the ground truth mask in the first frame. The modern approaches …

[PDF][PDF] 基于改进麻雀搜索算法的最大指数熵分割方法

马小晶, 贺航, 王宏伟, 田柯 - 科学技术与工程, 2023 - stae.com.cn
摘要为了解决基本麻雀搜索算法(sparrow search algorithm, SSA) 依赖初始种群和求解精度不
高的问题, 提出一种基于Circle 混沌映射和随机游走的改进的麻雀优化算法(improved sparrow …

Action coherence network for weakly-supervised temporal action localization

Y Zhai, L Wang, W Tang, Q Zhang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Weakly-supervised Temporal Action Localization (W-TAL) aims at simultaneously classifying
and locating all action instances with only video-level supervision. However, current W-TAL …

Higher-order potentials for video object segmentation in bilateral space

C Hao, Y Chen, ZX Yang, E Wu - Neurocomputing, 2020 - Elsevier
We propose an effective approach to make segmentation for objects in videos with an initial
input of the object masks in a few frames of the source video. In this method, we cast the …

Leveraging spatial residual attention and temporal Markov networks for video action understanding

Y Xu, Z Wang, X Zhang - Neural Networks, 2024 - Elsevier
The effective use of temporal relationships while extracting fertile spatial features is the key
to video action understanding. Video action understanding is a challenging visual task …

D2T: A Framework For transferring detection to tracking

H Qin, C Yu, C Gao, N Sang - Pattern Recognition, 2022 - Elsevier
Object detection methods draw increasing attention in deep learning based visual tracking
algorithms due to their robust discrimination and powerful regression ability. To further …

Calibrank: Effective LiDAR-camera extrinsic calibration by multi-modal learning to rank

X Wu, C Zhang, Y Liu - 2020 IEEE International Conference on …, 2020 - ieeexplore.ieee.org
Precise and online LiDAR-camera extrinsic calibration is one of the prerequisites of multi-
modal data fusion for autonomous perception. The existing 6-DoF pose regression networks …

Video object segmentation via couple streams and feature memory

Y Liang, X Xiao, S Qiu, Y Zhang, Z Su - IET Image Processing, 2024 - Wiley Online Library
In recent years, most video segmentation methods use deep CNN to process the input
image, but they did not fully mine the rich intermediate predictions in spatio‐temporal space …

[PDF][PDF] Object Affordances Graph Network for Action Recognition.

H Tan, Le Wang 0003, Q Zhang, Z Gao, N Zheng… - BMVC, 2019 - qilin-zhang.github.io
Human actions often involve interactions with objects, and such action possibilities of
objects were termed “affordances” in human-computer interaction (HCI) literature. To …