RGB-T image analysis technology and application: A survey
Abstract RGB-Thermal infrared (RGB-T) image analysis has been actively studied in recent
years. In the past decade, it has received wide attention and made a lot of important …
years. In the past decade, it has received wide attention and made a lot of important …
Learning in audio-visual context: A review, analysis, and new perspective
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …
understanding. To mimic human perception ability, audio-visual learning, aimed at …
When object detection meets knowledge distillation: A survey
Object detection (OD) is a crucial computer vision task that has seen the development of
many algorithms and models over the years. While the performance of current OD models …
many algorithms and models over the years. While the performance of current OD models …
Cocoa: Cross modality contrastive learning for sensor data
Self-Supervised Learning (SSL) is a new paradigm for learning discriminative
representations without labeled data, and has reached comparable or even state-of-the-art …
representations without labeled data, and has reached comparable or even state-of-the-art …
Multimodal object detection via probabilistic ensembling
Object detection with multimodal inputs can improve many safety-critical systems such as
autonomous vehicles (AVs). Motivated by AVs that operate in both day and night, we study …
autonomous vehicles (AVs). Motivated by AVs that operate in both day and night, we study …
D2-Net: Dual Disentanglement Network for Brain Tumor Segmentation With Missing Modalities
Multi-modal Magnetic Resonance Imaging (MRI) can provide complementary information for
automatic brain tumor segmentation, which is crucial for diagnosis and prognosis. While …
automatic brain tumor segmentation, which is crucial for diagnosis and prognosis. While …
Mix and localize: Localizing sound sources in mixtures
We present a method for simultaneously localizing multiple sound sources within a visual
scene. This task requires a model to both group a sound mixture into individual sources, and …
scene. This task requires a model to both group a sound mixture into individual sources, and …
Amodal panoptic segmentation
Humans have the remarkable ability to perceive objects as a whole, even when parts of
them are occluded. This ability of amodal perception forms the basis of our perceptual and …
them are occluded. This ability of amodal perception forms the basis of our perceptual and …
Multimodal dataset distillation for image-text retrieval
Dataset distillation methods offer the promise of reducing a large-scale dataset down to a
significantly smaller set of (potentially synthetic) training examples, which preserve sufficient …
significantly smaller set of (potentially synthetic) training examples, which preserve sufficient …
Self-supervised predictive learning: A negative-free method for sound source localization in visual scenes
Sound source localization in visual scenes aims to localize objects emitting the sound in a
given image. Recent works showing impressive localization performance typically rely on …
given image. Recent works showing impressive localization performance typically rely on …