Active learning for deep visual tracking
Convolutional neural networks (CNNs) have been successfully applied to the single target
tracking task in recent years. Generally, training a deep CNN model requires numerous …
tracking task in recent years. Generally, training a deep CNN model requires numerous …
5G mmWave cooperative positioning and mapping using multi-model PHD filter and map fusion
5G millimeter wave (mmWave) signals can enable accurate positioning in vehicular
networks when the base station and vehicles are equipped with large antenna arrays …
networks when the base station and vehicles are equipped with large antenna arrays …
Multi-accdoa: Localizing and detecting overlapping sounds from the same class with auxiliary duplicating permutation invariant training
K Shimada, Y Koyama, S Takahashi… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Sound event localization and detection (SELD) involves identifying the direction-of-arrival
(DOA) and the event class. The SELD methods with a class-wise output format make the …
(DOA) and the event class. The SELD methods with a class-wise output format make the …
Deep instance segmentation with automotive radar detection points
Automotive radar provides reliable environmental perception in all-weather conditions with
affordable cost, but it hardly supplies semantic and geometry information due to the sparsity …
affordable cost, but it hardly supplies semantic and geometry information due to the sparsity …
BAVS: bootstrapping audio-visual segmentation by integrating foundation knowledge
Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding
sources by predicting pixel-wise maps. Previous methods assume that each sound …
sources by predicting pixel-wise maps. Previous methods assume that each sound …
Audio-visual cross-attention network for robotic speaker tracking
Audio-visual signals can be used jointly for robotic perception as they complement each
other. Such multi-modal sensory fusion has a clear advantage, especially under noisy …
other. Such multi-modal sensory fusion has a clear advantage, especially under noisy …
Audio-visual event localization by learning spatial and semantic co-attention
C Xue, X Zhong, M Cai, H Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
This work aims to temporally localize events that are both audible and visible in video.
Previous methods mainly focused on temporal modeling of events with simple fusion of …
Previous methods mainly focused on temporal modeling of events with simple fusion of …
Heterogeneous multi-sensor fusion with random finite set multi-object densities
W Yi, L Chai - IEEE Transactions on Signal Processing, 2021 - ieeexplore.ieee.org
This paper addresses the density based multi-sensor cooperative fusion using random finite
set (RFS) type multi-object densities (MODs). Existing fusion methods use scalar weights to …
set (RFS) type multi-object densities (MODs). Existing fusion methods use scalar weights to …
Multi-modal perception attention network with self-supervised learning for audio-visual speaker tracking
Multi-modal fusion is proven to be an effective method to improve the accuracy and
robustness of speaker tracking, especially in complex scenarios. However, how to combine …
robustness of speaker tracking, especially in complex scenarios. However, how to combine …
Occlusion-robust online multi-object visual tracking using a GM-PHD filter with CNN-based re-identification
NL Baisa - Journal of Visual Communication and Image …, 2021 - Elsevier
We propose a novel online multi-object visual tracker using a Gaussian mixture Probability
Hypothesis Density (GM-PHD) filter and deep appearance learning. The GM-PHD filter has …
Hypothesis Density (GM-PHD) filter and deep appearance learning. The GM-PHD filter has …