Active Testing of Large Language Model via Multi-Stage Sampling

Y Huang, J Song, Q Hu, F Juefei-Xu, L Ma - arXiv preprint arXiv …, 2024 - arxiv.org
Performance evaluation plays a crucial role in the development life cycle of large language
models (LLMs). It estimates the model's capability, elucidates behavior characteristics, and …