A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

A comprehensive survey on test-time adaptation under distribution shifts

J Liang, R He, T Tan - International Journal of Computer Vision, 2024 - Springer
Abstract Machine learning methods strive to acquire a robust model during the training
process that can effectively generalize to test samples, even in the presence of distribution …

Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging

S Azizi, L Culp, J Freyberg, B Mustafa, S Baur… - Nature Biomedical …, 2023 - nature.com
Abstract Machine-learning models for medical tasks can match or surpass the performance
of clinical experts. However, in settings differing from those of the training dataset, the …

Towards out-of-distribution generalization: A survey

J Liu, Z Shen, Y He, X Zhang, R Xu, H Yu… - arXiv preprint arXiv …, 2021 - arxiv.org
Traditional machine learning paradigms are based on the assumption that both training and
test data follow the same statistical pattern, which is mathematically referred to as …

Teaching models to express their uncertainty in words

S Lin, J Hilton, O Evans - arXiv preprint arXiv:2205.14334, 2022 - arxiv.org
We show that a GPT-3 model can learn to express uncertainty about its own answers in
natural language--without use of model logits. When given a question, the model generates …

Exact feature distribution matching for arbitrary style transfer and domain generalization

Y Zhang, M Li, R Li, K Jia… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Arbitrary style transfer (AST) and domain generalization (DG) are important yet challenging
visual learning tasks, which can be cast as a feature distribution matching problem. With the …

Robust test-time adaptation in dynamic scenarios

L Yuan, B Xie, S Li - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Test-time adaptation (TTA) intends to adapt the pretrained model to test distributions with
only unlabeled test data streams. Most of the previous TTA methods have achieved great …

Single-source domain expansion network for cross-scene hyperspectral image classification

Y Zhang, W Li, W Sun, R Tao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Currently, cross-scene hyperspectral image (HSI) classification has drawn increasing
attention. It is necessary to train a model only on source domain (SD) and directly …

Self-supervised learning for videos: A survey

MC Schiappa, YS Rawat, M Shah - ACM Computing Surveys, 2023 - dl.acm.org
The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …

Trustworthy AI: From principles to practices

B Li, P Qi, B Liu, S Di, J Liu, J Pei, J Yi… - ACM Computing Surveys, 2023 - dl.acm.org
The rapid development of Artificial Intelligence (AI) technology has enabled the deployment
of various systems based on it. However, many current AI systems are found vulnerable to …