[PDF][PDF] The efficiency spectrum of large language models: An algorithmic survey

T Ding, T Chen, H Zhu, J Jiang, Y Zhong… - arXiv preprint arXiv …, 2023 - researchgate.net
The rapid growth of Large Language Models (LLMs) has been a driving force in
transforming various domains, reshaping the artificial general intelligence landscape …

[PDF][PDF] Corpus Complexity Matters in Pretraining Language Models

A Agrawal, S Singh - Proceedings of The Fourth Workshop on …, 2023 - aclanthology.org
It is well known that filtering low-quality data before pretraining language models or
selecting suitable data from domains similar to downstream task datasets generally leads to …

CLIMB: Curriculum Learning for Infant-inspired Model Building

RD Martinez, Z Goriely, H McGovern, C Davis… - arXiv preprint arXiv …, 2023 - arxiv.org
We describe our team's contribution to the STRICT-SMALL track of the BabyLM Challenge.
The challenge requires training a language model from scratch using only a relatively small …

Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

T Xia, L Ding, G Wan, Y Zhan, B Du, D Tao - arXiv preprint arXiv …, 2024 - arxiv.org
Answering complex logical queries over incomplete knowledge graphs (KGs) is challenging.
Most previous works have focused on learning entity/relation embeddings and simulating …

Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing

RD Martinez, Z Goriely, A Caines, P Buttery… - arXiv preprint arXiv …, 2024 - arxiv.org
Language models strongly rely on frequency information because they maximize the
likelihood of tokens during pre-training. As a consequence, language models tend to not …