Active learning for deep visual tracking

D Yuan, X Chang, Q Liu, Y Yang… - … on Neural Networks …, 2023 - ieeexplore.ieee.org
Convolutional neural networks (CNNs) have been successfully applied to the single target
tracking task in recent years. Generally, training a deep CNN model requires numerous …

5G mmWave cooperative positioning and mapping using multi-model PHD filter and map fusion

H Kim, K Granström, L Gao, G Battistelli… - IEEE Transactions …, 2020 - ieeexplore.ieee.org
5G millimeter wave (mmWave) signals can enable accurate positioning in vehicular
networks when the base station and vehicles are equipped with large antenna arrays …

Multi-accdoa: Localizing and detecting overlapping sounds from the same class with auxiliary duplicating permutation invariant training

K Shimada, Y Koyama, S Takahashi… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Sound event localization and detection (SELD) involves identifying the direction-of-arrival
(DOA) and the event class. The SELD methods with a class-wise output format make the …

Deep instance segmentation with automotive radar detection points

J Liu, W Xiong, L Bai, Y Xia, T Huang… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
Automotive radar provides reliable environmental perception in all-weather conditions with
affordable cost, but it hardly supplies semantic and geometry information due to the sparsity …

BAVS: bootstrapping audio-visual segmentation by integrating foundation knowledge

C Liu, P Li, H Zhang, L Li, Z Huang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding
sources by predicting pixel-wise maps. Previous methods assume that each sound …

Audio-visual cross-attention network for robotic speaker tracking

X Qian, Z Wang, J Wang, G Guan… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
Audio-visual signals can be used jointly for robotic perception as they complement each
other. Such multi-modal sensory fusion has a clear advantage, especially under noisy …

Audio-visual event localization by learning spatial and semantic co-attention

C Xue, X Zhong, M Cai, H Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
This work aims to temporally localize events that are both audible and visible in video.
Previous methods mainly focused on temporal modeling of events with simple fusion of …

Heterogeneous multi-sensor fusion with random finite set multi-object densities

W Yi, L Chai - IEEE Transactions on Signal Processing, 2021 - ieeexplore.ieee.org
This paper addresses the density based multi-sensor cooperative fusion using random finite
set (RFS) type multi-object densities (MODs). Existing fusion methods use scalar weights to …

Multi-modal perception attention network with self-supervised learning for audio-visual speaker tracking

Y Li, H Liu, H Tang - Proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Multi-modal fusion is proven to be an effective method to improve the accuracy and
robustness of speaker tracking, especially in complex scenarios. However, how to combine …

Occlusion-robust online multi-object visual tracking using a GM-PHD filter with CNN-based re-identification

NL Baisa - Journal of Visual Communication and Image …, 2021 - Elsevier
We propose a novel online multi-object visual tracker using a Gaussian mixture Probability
Hypothesis Density (GM-PHD) filter and deep appearance learning. The GM-PHD filter has …