Kmmlu: Measuring massive multitask language understanding in korean

G Son, H Lee, S Kim, S Kim, N Muennighoff… - arXiv preprint arXiv …, 2024 - arxiv.org
We propose KMMLU, a new Korean benchmark with 35,030 expert-level multiple-choice
questions across 45 subjects ranging from humanities to STEM. Unlike previous Korean …

Llm-as-a-judge & reward model: What they can and cannot do

G Son, H Ko, H Lee, Y Kim, S Hong - arXiv preprint arXiv:2409.11239, 2024 - arxiv.org
LLM-as-a-Judge and reward models are widely used alternatives of multiple-choice
questions or human annotators for large language model (LLM) evaluation. Their efficacy …

This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish

L Augustyniak, K Tagowski… - Advances in …, 2022 - proceedings.neurips.cc
The availability of compute and data to train larger and larger language models increases
the demand for robust methods of benchmarking the true progress of LM training. Recent …

Hae-rae bench: Evaluation of korean knowledge in language models

G Son, H Lee, S Kim, H Kim, J Lee, JW Yeom… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) trained on massive corpora demonstrate impressive
capabilities in a wide range of tasks. While there are ongoing efforts to adapt these models …

CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean

E Kim, J Suk, P Oh, H Yoo, J Thorne, A Oh - arXiv preprint arXiv …, 2024 - arxiv.org
Despite the rapid development of large language models (LLMs) for the Korean language,
there remains an obvious lack of benchmark datasets that test the requisite Korean cultural …

Improving Language Models Meaning Understanding and Consistency by Learning Conceptual Roles from Dictionary

ME Jang, T Lukasiewicz - arXiv preprint arXiv:2310.15541, 2023 - arxiv.org
The non-humanlike behaviour of contemporary pre-trained language models (PLMs) is a
leading cause undermining their trustworthiness. A striking phenomenon of such faulty …

Efficient and effective vocabulary expansion towards multilingual large language models

S Kim, S Choi, M Jeong - arXiv preprint arXiv:2402.14714, 2024 - arxiv.org
This report introduces\texttt {EEVE-Korean-v1. 0}, a Korean adaptation of large language
models that exhibit remarkable capabilities across English and Korean text understanding …

Open Korean corpora: A practical report

WI Cho, S Moon, Y Song - arXiv preprint arXiv:2012.15621, 2020 - arxiv.org
Korean is often referred to as a low-resource language in the research community. While
this claim is partially true, it is also because the availability of resources is inadequately …

KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark

S Jang, S Lee, H Yu - arXiv preprint arXiv:2402.17377, 2024 - arxiv.org
As language models are often deployed as chatbot assistants, it becomes a virtue for
models to engage in conversations in a user's first language. While these models are trained …

Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation

T Gauthier-Caron, S Siriwardhana, E Stein… - arXiv preprint arXiv …, 2024 - arxiv.org
By merging models, AI systems can combine the distinct strengths of separate language
models, achieving a balance between multiple capabilities without requiring substantial …