Audio–visual particle flow smc-phd filtering for multi-speaker tracking

D Yuan, X Chang, Q Liu, Y Yang… - … on Neural Networks …, 2023 - ieeexplore.ieee.org

Convolutional neural networks (CNNs) have been successfully applied to the single target
tracking task in recent years. Generally, training a deep CNN model requires numerous …

被引用次数：98 相关文章所有 7 个版本

[PDF] arxiv.org

5G mmWave cooperative positioning and mapping using multi-model PHD filter and map fusion

H Kim, K Granström, L Gao, G Battistelli… - IEEE Transactions …, 2020 - ieeexplore.ieee.org

5G millimeter wave (mmWave) signals can enable accurate positioning in vehicular
networks when the base station and vehicles are equipped with large antenna arrays …

被引用次数：147 相关文章所有 8 个版本

[PDF] arxiv.org

Multi-accdoa: Localizing and detecting overlapping sounds from the same class with auxiliary duplicating permutation invariant training

K Shimada, Y Koyama, S Takahashi… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Sound event localization and detection (SELD) involves identifying the direction-of-arrival
(DOA) and the event class. The SELD methods with a class-wise output format make the …

被引用次数：90 相关文章所有 7 个版本

[PDF] arxiv.org

Deep instance segmentation with automotive radar detection points

J Liu, W Xiong, L Bai, Y Xia, T Huang… - IEEE Transactions …, 2022 - ieeexplore.ieee.org

Automotive radar provides reliable environmental perception in all-weather conditions with
affordable cost, but it hardly supplies semantic and geometry information due to the sparsity …

被引用次数：68 相关文章所有 9 个版本

[PDF] arxiv.org

BAVS: bootstrapping audio-visual segmentation by integrating foundation knowledge

C Liu, P Li, H Zhang, L Li, Z Huang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding
sources by predicting pixel-wise maps. Previous methods assume that each sound …

被引用次数：18 相关文章所有 3 个版本

[PDF] ieee.org

Audio-visual cross-attention network for robotic speaker tracking

X Qian, Z Wang, J Wang, G Guan… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org

Audio-visual signals can be used jointly for robotic perception as they complement each
other. Such multi-modal sensory fusion has a clear advantage, especially under noisy …

被引用次数：34 相关文章所有 2 个版本

[PDF] surrey.ac.uk

Audio-visual event localization by learning spatial and semantic co-attention

C Xue, X Zhong, M Cai, H Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

This work aims to temporally localize events that are both audible and visible in video.
Previous methods mainly focused on temporal modeling of events with simple fusion of …

被引用次数：29 相关文章所有 2 个版本

[PDF] arxiv.org

Heterogeneous multi-sensor fusion with random finite set multi-object densities

W Yi, L Chai - IEEE Transactions on Signal Processing, 2021 - ieeexplore.ieee.org

This paper addresses the density based multi-sensor cooperative fusion using random finite
set (RFS) type multi-object densities (MODs). Existing fusion methods use scalar weights to …

被引用次数：35 相关文章所有 6 个版本

[PDF] aaai.org

Multi-modal perception attention network with self-supervised learning for audio-visual speaker tracking

Y Li, H Liu, H Tang - Proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org

Multi-modal fusion is proven to be an effective method to improve the accuracy and
robustness of speaker tracking, especially in complex scenarios. However, how to combine …

被引用次数：24 相关文章所有 9 个版本

[PDF] arxiv.org

Occlusion-robust online multi-object visual tracking using a GM-PHD filter with CNN-based re-identification

NL Baisa - Journal of Visual Communication and Image …, 2021 - Elsevier

We propose a novel online multi-object visual tracker using a Gaussian mixture Probability
Hypothesis Density (GM-PHD) filter and deep appearance learning. The GM-PHD filter has …

被引用次数：47 相关文章所有 6 个版本