Do large language models have compositional ability? an investigation into limitations and...

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org

We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

被引用次数：287 相关文章所有 7 个版本

[PDF] arxiv.org

Tensor attention training: Provably efficient learning of higher-order transformers

Y Liang, Z Shi, Z Song, Y Zhou - arXiv preprint arXiv:2405.16411, 2024 - arxiv.org

Tensor Attention, a multi-view attention that is able to capture high-order correlations among
multiple modalities, can overcome the representational limitations of classical matrix …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers

Y Liang, H Liu, Z Shi, Z Song, Z Xu, J Yin - arXiv preprint arXiv:2405.05219, 2024 - arxiv.org

The self-attention mechanism is the key to the success of transformers in recent Large
Language Models (LLMs). However, the quadratic computational cost $ O (n^ 2) $ in the …

被引用次数：20 相关文章所有 2 个版本

[PDF] arxiv.org

Multi-layer transformers gradient can be approximated in almost linear time

Y Liang, Z Sha, Z Shi, Z Song, Y Zhou - arXiv preprint arXiv:2408.13233, 2024 - arxiv.org

The computational complexity of the self-attention mechanism in popular transformer
architectures poses significant challenges for training and inference, and becomes the …

被引用次数：18 相关文章所有 5 个版本

[PDF] arxiv.org

Hsr-enhanced sparse attention acceleration

B Chen, Y Liang, Z Sha, Z Shi, Z Song - arXiv preprint arXiv:2410.10165, 2024 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable capabilities across various
applications, but their performance on long-context tasks is often limited by the …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Is a picture worth a thousand words? delving into spatial reasoning for vision language models

J Wang, Y Ming, Z Shi, V Vineet, X Wang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) and vision-language models (VLMs) have demonstrated
remarkable performance across a wide range of tasks and domains. Despite this promise …

被引用次数：12 相关文章所有 5 个版本

[PDF] arxiv.org

Looped relu mlps may be all you need as practical programmable computers

Y Liang, Z Sha, Z Shi, Z Song, Y Zhou - arXiv preprint arXiv:2410.09375, 2024 - arxiv.org

Previous work has demonstrated that attention mechanisms are Turing complete. More
recently, it has been shown that a looped 13-layer Transformer can function as a universal …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent

B Chen, X Li, Y Liang, Z Shi, Z Song - arXiv preprint arXiv:2410.11268, 2024 - arxiv.org

In-context learning has been recognized as a key factor in the success of Large Language
Models (LLMs). It refers to the model's ability to learn patterns on the fly from provided in …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Differentially private attention computation

Y Gao, Z Song, X Yang, Y Zhou - arXiv preprint arXiv:2305.04701, 2023 - arxiv.org

Large language models (LLMs) have had a profound impact on numerous aspects of daily
life including natural language processing, content generation, research methodologies and …

被引用次数：24 相关文章所有 2 个版本

[PDF] openreview.net

Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic

J Gu, C Li, Y Liang, Z Shi, Z Song… - arXiv preprint arXiv …, 2024 - openreview.net

In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the
internal representations harnessed by neural networks and Transformers. Building on recent …

被引用次数：17 相关文章所有 4 个版本