Kmmlu: Measuring massive multitask language understanding in korean
We propose KMMLU, a new Korean benchmark with 35,030 expert-level multiple-choice
questions across 45 subjects ranging from humanities to STEM. Unlike previous Korean …
questions across 45 subjects ranging from humanities to STEM. Unlike previous Korean …
Llm-as-a-judge & reward model: What they can and cannot do
LLM-as-a-Judge and reward models are widely used alternatives of multiple-choice
questions or human annotators for large language model (LLM) evaluation. Their efficacy …
questions or human annotators for large language model (LLM) evaluation. Their efficacy …
This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish
L Augustyniak, K Tagowski… - Advances in …, 2022 - proceedings.neurips.cc
The availability of compute and data to train larger and larger language models increases
the demand for robust methods of benchmarking the true progress of LM training. Recent …
the demand for robust methods of benchmarking the true progress of LM training. Recent …
Hae-rae bench: Evaluation of korean knowledge in language models
Large language models (LLMs) trained on massive corpora demonstrate impressive
capabilities in a wide range of tasks. While there are ongoing efforts to adapt these models …
capabilities in a wide range of tasks. While there are ongoing efforts to adapt these models …
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean
Despite the rapid development of large language models (LLMs) for the Korean language,
there remains an obvious lack of benchmark datasets that test the requisite Korean cultural …
there remains an obvious lack of benchmark datasets that test the requisite Korean cultural …
Improving Language Models Meaning Understanding and Consistency by Learning Conceptual Roles from Dictionary
ME Jang, T Lukasiewicz - arXiv preprint arXiv:2310.15541, 2023 - arxiv.org
The non-humanlike behaviour of contemporary pre-trained language models (PLMs) is a
leading cause undermining their trustworthiness. A striking phenomenon of such faulty …
leading cause undermining their trustworthiness. A striking phenomenon of such faulty …
Efficient and effective vocabulary expansion towards multilingual large language models
S Kim, S Choi, M Jeong - arXiv preprint arXiv:2402.14714, 2024 - arxiv.org
This report introduces\texttt {EEVE-Korean-v1. 0}, a Korean adaptation of large language
models that exhibit remarkable capabilities across English and Korean text understanding …
models that exhibit remarkable capabilities across English and Korean text understanding …
Open Korean corpora: A practical report
Korean is often referred to as a low-resource language in the research community. While
this claim is partially true, it is also because the availability of resources is inadequately …
this claim is partially true, it is also because the availability of resources is inadequately …
KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark
As language models are often deployed as chatbot assistants, it becomes a virtue for
models to engage in conversations in a user's first language. While these models are trained …
models to engage in conversations in a user's first language. While these models are trained …
Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation
T Gauthier-Caron, S Siriwardhana, E Stein… - arXiv preprint arXiv …, 2024 - arxiv.org
By merging models, AI systems can combine the distinct strengths of separate language
models, achieving a balance between multiple capabilities without requiring substantial …
models, achieving a balance between multiple capabilities without requiring substantial …