Coloring the blank slate: Pre-training imparts a hierarchical inductive bias to sequence-to-seque...

D Hupkes, M Giulianelli, V Dankers, M Artetxe… - arXiv preprint arXiv …, 2022 - arxiv.org

The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …

被引用次数：56 相关文章所有 7 个版本

[PDF] ethz.ch

[PDF][PDF] Findings of the BabyLM Challenge: Sample-efficient pretraining on developmentally plausible corpora

A Warstadt, A Mueller, L Choshen… - … of the BabyLM …, 2023 - research-collection.ethz.ch

Children can acquire language from less than 100 million words of input. Large language
models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data …

被引用次数：58 相关文章所有 5 个版本

[PDF] arxiv.org

A theory of emergent in-context learning as implicit structure induction

M Hahn, N Goyal - arXiv preprint arXiv:2303.07971, 2023 - arxiv.org

Scaling large language models (LLMs) leads to an emergent capacity to learn in-context
from example demonstrations. Despite progress, theoretical understanding of this …

被引用次数：43 相关文章所有 2 个版本

[PDF] mit.edu

Unit testing for concepts in neural networks

C Lovering, E Pavlick - Transactions of the Association for …, 2022 - direct.mit.edu

Many complex problems are naturally understood in terms of symbolic concepts. For
example, our concept of “cat” is related to our concepts of “ears” and “whiskers” in a non …

被引用次数：28 相关文章所有 7 个版本

[PDF] mit.edu

How abstract is linguistic generalization in large language models? Experiments with argument structure

M Wilson, J Petty, R Frank - Transactions of the Association for …, 2023 - direct.mit.edu

Abstract Language models are typically evaluated on their success at predicting the
distribution of specific words in specific contexts. Yet linguistic knowledge also encodes …

被引用次数：9 相关文章所有 7 个版本

[PDF] arxiv.org

Grokking of hierarchical structure in vanilla transformers

S Murty, P Sharma, J Andreas, CD Manning - arXiv preprint arXiv …, 2023 - arxiv.org

For humans, language production and comprehension is sensitive to the hierarchical
structure of sentences. In natural language processing, past work has questioned how …

被引用次数：25 相关文章所有 4 个版本

[PDF] arxiv.org

How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech

A Yedetore, T Linzen, R Frank, RT McCoy - arXiv preprint arXiv …, 2023 - arxiv.org

When acquiring syntax, children consistently choose hierarchical rules over competing non-
hierarchical possibilities. Is this preference due to a learning bias for hierarchical structure …

被引用次数：23 相关文章所有 11 个版本

[PDF] arxiv.org

Language model acceptability judgements are not always robust to context

K Sinha, J Gauthier, A Mueller, K Misra… - arXiv preprint arXiv …, 2022 - arxiv.org

Targeted syntactic evaluations of language models ask whether models show stable
preferences for syntactically acceptable content over minimal-pair unacceptable inputs. Most …

被引用次数：13 相关文章所有 6 个版本

[PDF] aclanthology.org

The Impact of Depth on Compositional Generalization in Transformer Language Models

J Petty, S Steenkiste, I Dasgupta, F Sha… - Proceedings of the …, 2024 - aclanthology.org

To process novel sentences, language models (LMs) must generalize compositionally—
combine familiar elements in new ways. What aspects of a model's structure promote …

被引用次数：1 相关文章

[PDF] arxiv.org

How to plant trees in language models: Data and architectural effects on the emergence of syntactic inductive biases

A Mueller, T Linzen - arXiv preprint arXiv:2305.19905, 2023 - arxiv.org

Accurate syntactic representations are essential for robust generalization in natural
language. Recent work has found that pre-training can teach language models to rely on …

被引用次数：11 相关文章所有 9 个版本