- 学术资源搜索

[PDF][PDF] The efficiency spectrum of large language models: An algorithmic survey

T Ding, T Chen, H Zhu, J Jiang, Y Zhong… - arXiv preprint arXiv …, 2023 - researchgate.net

The rapid growth of Large Language Models (LLMs) has been a driving force in
transforming various domains, reshaping the artificial general intelligence landscape …

被引用次数：16 相关文章所有 3 个版本

[PDF] aclanthology.org

[PDF][PDF] Corpus Complexity Matters in Pretraining Language Models

A Agrawal, S Singh - Proceedings of The Fourth Workshop on …, 2023 - aclanthology.org

It is well known that filtering low-quality data before pretraining language models or
selecting suitable data from domains similar to downstream task datasets generally leads to …

被引用次数：3 相关文章所有 5 个版本

[PDF] arxiv.org

CLIMB: Curriculum Learning for Infant-inspired Model Building

RD Martinez, Z Goriely, H McGovern, C Davis… - arXiv preprint arXiv …, 2023 - arxiv.org

We describe our team's contribution to the STRICT-SMALL track of the BabyLM Challenge.
The challenge requires training a language model from scratch using only a relatively small …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning

T Xia, L Ding, G Wan, Y Zhan, B Du, D Tao - arXiv preprint arXiv …, 2024 - arxiv.org

Answering complex logical queries over incomplete knowledge graphs (KGs) is challenging.
Most previous works have focused on learning entity/relation embeddings and simulating …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing

RD Martinez, Z Goriely, A Caines, P Buttery… - arXiv preprint arXiv …, 2024 - arxiv.org

Language models strongly rely on frequency information because they maximize the
likelihood of tokens during pre-training. As a consequence, language models tend to not …