RGB-T image analysis technology and application: A survey

K Song, Y Zhao, L Huang, Y Yan, Q Meng - Engineering Applications of …, 2023 - Elsevier
Abstract RGB-Thermal infrared (RGB-T) image analysis has been actively studied in recent
years. In the past decade, it has received wide attention and made a lot of important …

Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - arXiv preprint arXiv:2208.09579, 2022 - arxiv.org
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

When object detection meets knowledge distillation: A survey

Z Li, P Xu, X Chang, L Yang, Y Zhang… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Object detection (OD) is a crucial computer vision task that has seen the development of
many algorithms and models over the years. While the performance of current OD models …

Cocoa: Cross modality contrastive learning for sensor data

S Deldari, H Xue, A Saeed, DV Smith… - Proceedings of the ACM …, 2022 - dl.acm.org
Self-Supervised Learning (SSL) is a new paradigm for learning discriminative
representations without labeled data, and has reached comparable or even state-of-the-art …

Multimodal object detection via probabilistic ensembling

YT Chen, J Shi, Z Ye, C Mertz, D Ramanan… - European Conference on …, 2022 - Springer
Object detection with multimodal inputs can improve many safety-critical systems such as
autonomous vehicles (AVs). Motivated by AVs that operate in both day and night, we study …

D2-Net: Dual Disentanglement Network for Brain Tumor Segmentation With Missing Modalities

Q Yang, X Guo, Z Chen, PYM Woo… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Multi-modal Magnetic Resonance Imaging (MRI) can provide complementary information for
automatic brain tumor segmentation, which is crucial for diagnosis and prognosis. While …

Mix and localize: Localizing sound sources in mixtures

X Hu, Z Chen, A Owens - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
We present a method for simultaneously localizing multiple sound sources within a visual
scene. This task requires a model to both group a sound mixture into individual sources, and …

Amodal panoptic segmentation

R Mohan, A Valada - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Humans have the remarkable ability to perceive objects as a whole, even when parts of
them are occluded. This ability of amodal perception forms the basis of our perceptual and …

Multimodal dataset distillation for image-text retrieval

X Wu, Z Deng, O Russakovsky - arXiv preprint arXiv:2308.07545, 2023 - arxiv.org
Dataset distillation methods offer the promise of reducing a large-scale dataset down to a
significantly smaller set of (potentially synthetic) training examples, which preserve sufficient …

Self-supervised predictive learning: A negative-free method for sound source localization in visual scenes

Z Song, Y Wang, J Fan, T Tan, Z Zhang - arXiv preprint arXiv:2203.13412, 2022 - arxiv.org
Sound source localization in visual scenes aims to localize objects emitting the sound in a
given image. Recent works showing impressive localization performance typically rely on …