There is more than meets the eye: Self-supervised multi-object detection and tracking with...

K Song, Y Zhao, L Huang, Y Yan, Q Meng - Engineering Applications of …, 2023 - Elsevier

Abstract RGB-Thermal infrared (RGB-T) image analysis has been actively studied in recent
years. In the past decade, it has received wide attention and made a lot of important …

被引用次数：34 相关文章所有 2 个版本

[PDF] arxiv.org

Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - arXiv preprint arXiv:2208.09579, 2022 - arxiv.org

Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

被引用次数：62 相关文章所有 2 个版本

When object detection meets knowledge distillation: A survey

Z Li, P Xu, X Chang, L Yang, Y Zhang… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Object detection (OD) is a crucial computer vision task that has seen the development of
many algorithms and models over the years. While the performance of current OD models …

被引用次数：70 相关文章所有 6 个版本

[PDF] arxiv.org

Cocoa: Cross modality contrastive learning for sensor data

S Deldari, H Xue, A Saeed, DV Smith… - Proceedings of the ACM …, 2022 - dl.acm.org

Self-Supervised Learning (SSL) is a new paradigm for learning discriminative
representations without labeled data, and has reached comparable or even state-of-the-art …

被引用次数：69 相关文章所有 5 个版本

[PDF] arxiv.org

Multimodal object detection via probabilistic ensembling

YT Chen, J Shi, Z Ye, C Mertz, D Ramanan… - European Conference on …, 2022 - Springer

Object detection with multimodal inputs can improve many safety-critical systems such as
autonomous vehicles (AVs). Motivated by AVs that operate in both day and night, we study …

被引用次数：121 相关文章所有 5 个版本

D²-Net: Dual Disentanglement Network for Brain Tumor Segmentation With Missing Modalities

Q Yang, X Guo, Z Chen, PYM Woo… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Multi-modal Magnetic Resonance Imaging (MRI) can provide complementary information for
automatic brain tumor segmentation, which is crucial for diagnosis and prognosis. While …

被引用次数：68 相关文章所有 5 个版本

[PDF] thecvf.com

Mix and localize: Localizing sound sources in mixtures

X Hu, Z Chen, A Owens - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com

We present a method for simultaneously localizing multiple sound sources within a visual
scene. This task requires a model to both group a sound mixture into individual sources, and …

被引用次数：55 相关文章所有 5 个版本

[PDF] thecvf.com

Amodal panoptic segmentation

R Mohan, A Valada - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com

Humans have the remarkable ability to perceive objects as a whole, even when parts of
them are occluded. This ability of amodal perception forms the basis of our perceptual and …

被引用次数：45 相关文章所有 7 个版本

[PDF] arxiv.org

Multimodal dataset distillation for image-text retrieval

X Wu, Z Deng, O Russakovsky - arXiv preprint arXiv:2308.07545, 2023 - arxiv.org

Dataset distillation methods offer the promise of reducing a large-scale dataset down to a
significantly smaller set of (potentially synthetic) training examples, which preserve sufficient …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Self-supervised predictive learning: A negative-free method for sound source localization in visual scenes

Z Song, Y Wang, J Fan, T Tan, Z Zhang - arXiv preprint arXiv:2203.13412, 2022 - arxiv.org

Sound source localization in visual scenes aims to localize objects emitting the sound in a
given image. Recent works showing impressive localization performance typically rely on …

被引用次数：45 相关文章所有 4 个版本