Dealing with missing modalities in the visual question answer-difference prediction task...

Y Oh, DJ Kim, IS Kweon - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com

The capability of the traditional semi-supervised learning (SSL) methods is far from real-
world application due to severely biased pseudo-labels caused by (1) class imbalance and …

被引用次数：83 相关文章所有 11 个版本

[PDF] arxiv.org

Mcdal: Maximum classifier discrepancy for active learning

JW Cho, DJ Kim, Y Jung… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Recent state-of-the-art active learning methods have mostly leveraged generative
adversarial networks (GANs) for sample acquisition; however, GAN is usually known to …

被引用次数：43 相关文章所有 8 个版本

[PDF] thecvf.com

Generative bias for robust visual question answering

JW Cho, DJ Kim, H Ryu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract The task of Visual Question Answering (VQA) is known to be plagued by the issue
of VQA models exploiting biases within the dataset to make its final prediction. Various …

被引用次数：18 相关文章所有 10 个版本

[PDF] arxiv.org

Visible-infrared person re-identification using privileged intermediate information

M Alehdaghi, A Josi, RMO Cruz, E Granger - European Conference on …, 2022 - Springer

Visible-infrared person re-identification (ReID) aims to recognize a same person of interest
across a network of RGB and IR cameras. Some deep learning (DL) models have directly …

被引用次数：18 相关文章所有 5 个版本

[PDF] arxiv.org

Clip-td: Clip targeted distillation for vision-language tasks

Z Wang, N Codella, YC Chen, L Zhou, J Yang… - arXiv preprint arXiv …, 2022 - arxiv.org

Contrastive language-image pretraining (CLIP) links vision and language modalities into a
unified embedding space, yielding the tremendous potential for vision-language (VL) tasks …

被引用次数：20 相关文章所有 3 个版本

[PDF] arxiv.org

Dense relational image captioning via multi-task triple-stream networks

DJ Kim, TH Oh, J Choi, IS Kweon - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

We introduce dense relational captioning, a novel image captioning task which aims to
generate multiple captions with respect to relational information between objects in a visual …

被引用次数：33 相关文章所有 8 个版本

[PDF] thecvf.com

Enhancing modality-agnostic representations via meta-learning for brain tumor segmentation

A Konwer, X Hu, J Bae, X Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com

In medical vision, different imaging modalities provide complementary information. However,
in practice, not all modalities may be available during inference or even training. Previous …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Acp++: Action co-occurrence priors for human-object interaction detection

DJ Kim, X Sun, J Choi, S Lin… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

A common problem in the task of human-object interaction (HOI) detection is that numerous
HOI classes have only a small number of labeled examples, resulting in training sets with a …

被引用次数：19 相关文章所有 8 个版本

[PDF] arxiv.org

Signing outside the studio: Benchmarking background robustness for continuous sign language recognition

Y Jang, Y Oh, JW Cho, DJ Kim, JS Chung… - arXiv preprint arXiv …, 2022 - arxiv.org

The goal of this work is background-robust continuous sign language recognition. Most
existing Continuous Sign Language Recognition (CSLR) benchmarks have fixed …

被引用次数：8 相关文章所有 9 个版本

Towards robust multimodal sentiment analysis under uncertain signal missing

M Li, D Yang, L Zhang - IEEE Signal Processing Letters, 2023 - ieeexplore.ieee.org

Multimodal Sentiment Analysis (MSA) has attracted widespread research attention recently.
Most MSA studies are based on the assumption of signal completeness. However, many …

被引用次数：13 相关文章所有 2 个版本