Nico++: Towards better benchmarking for domain generalization

X Zhang, Y He, R Xu, H Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite the remarkable performance that modern deep neural networks have achieved on
independent and identically distributed (IID) data, they can crash under distribution shifts …

How many unicorns are in this image? a safety evaluation benchmark for vision llms

H Tu, C Cui, Z Wang, Y Zhou, B Zhao, J Han… - arXiv preprint arXiv …, 2023 - arxiv.org
This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning. Different
from prior studies, we shift our focus from evaluating standard performance to introducing a …

Coco-o: A benchmark for object detectors under natural distribution shifts

X Mao, Y Chen, Y Zhu, D Chen, H Su… - Proceedings of the …, 2023 - openaccess.thecvf.com
Practical object detection application can lose its effectiveness on image inputs with natural
distribution shifts. This problem leads the research community to pay more attention on the …

Fourier-based augmentation with applications to domain generalization

Q Xu, R Zhang, Z Fan, Y Wang, YY Wu, Y Zhang - Pattern Recognition, 2023 - Elsevier
When deployed on a new domain different from the training set, deep learning often suffers
from severe performance degradation. To combat domain shift, domain adaptation and …

Industrial anomaly detection with domain shift: A real-world dataset and masked multi-scale reconstruction

Z Zhang, Z Zhao, X Zhang, C Sun, X Chen - Computers in Industry, 2023 - Elsevier
Industrial anomaly detection (IAD) is crucial for automating industrial quality inspection. The
diversity of the datasets is the foundation for developing comprehensive IAD algorithms …

Vhelm: A holistic evaluation of vision language models

T Lee, H Tu, CH Wong, W Zheng, Y Zhou, Y Mai… - arXiv preprint arXiv …, 2024 - arxiv.org
Current benchmarks for assessing vision-language models (VLMs) often focus on their
perception or problem-solving capabilities and neglect other critical aspects such as …

Sight beyond text: Multi-modal training enhances llms in truthfulness and ethics

H Tu, B Zhao, C Wei, C Xie - arXiv preprint arXiv:2309.07120, 2023 - arxiv.org
Multi-modal large language models (MLLMs) are trained based on large language models
(LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual …

Unsupervised camouflaged object segmentation as domain adaptation

Y Zhang, C Wu - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Deep learning for unsupervised image segmentation remains challenging due to the
absence of human labels. The common idea is to train a segmentation head, with the …

Ood-cv-v2: An extended benchmark for robustness to out-of-distribution shifts of individual nuisances in natural images

B Zhao, J Wang, W Ma, A Jesslen… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One
reason is that existing robustness benchmarks are limited, as they either rely on synthetic …

3d adversarial augmentations for robust out-of-domain predictions

A Lehner, S Gasperini, A Marcos-Ramiro… - International Journal of …, 2024 - Springer
Since real-world training datasets cannot properly sample the long tail of the underlying data
distribution, corner cases and rare out-of-domain samples can severely hinder the …