Larger language models do in-context learning differently
We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …
Tensor attention training: Provably efficient learning of higher-order transformers
Tensor Attention, a multi-view attention that is able to capture high-order correlations among
multiple modalities, can overcome the representational limitations of classical matrix …
multiple modalities, can overcome the representational limitations of classical matrix …
Algorithm and hardness for dynamic attention maintenance in large language models
Large language models (LLMs) have made fundamental changes in human life. The
attention scheme is one of the key components over all the LLMs, such as BERT, GPT-1 …
attention scheme is one of the key components over all the LLMs, such as BERT, GPT-1 …
Multi-layer transformers gradient can be approximated in almost linear time
The computational complexity of the self-attention mechanism in popular transformer
architectures poses significant challenges for training and inference, and becomes the …
architectures poses significant challenges for training and inference, and becomes the …
Is a picture worth a thousand words? delving into spatial reasoning for vision language models
Large language models (LLMs) and vision-language models (VLMs) have demonstrated
remarkable performance across a wide range of tasks and domains. Despite this promise …
remarkable performance across a wide range of tasks and domains. Despite this promise …
Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent
In-context learning has been recognized as a key factor in the success of Large Language
Models (LLMs). It refers to the model's ability to learn patterns on the fly from provided in …
Models (LLMs). It refers to the model's ability to learn patterns on the fly from provided in …
Differentially private attention computation
Large language models (LLMs) have had a profound impact on numerous aspects of daily
life including natural language processing, content generation, research methodologies and …
life including natural language processing, content generation, research methodologies and …
Differential privacy mechanisms in neural tangent kernel regression
Training data privacy is a fundamental problem in modern Artificial Intelligence (AI)
applications, such as face recognition, recommendation systems, language generation, and …
applications, such as face recognition, recommendation systems, language generation, and …
Exploring the frontiers of softmax: Provable optimization, applications in diffusion model, and beyond
The softmax activation function plays a crucial role in the success of large language models
(LLMs), particularly in the self-attention mechanism of the widely adopted Transformer …
(LLMs), particularly in the self-attention mechanism of the widely adopted Transformer …
Do large language models have compositional ability? an investigation into limitations and scalability
Large language models (LLM) have emerged as a powerful tool exhibiting remarkable in-
context learning (ICL) capabilities. In this study, we delve into the ICL capabilities of LLMs on …
context learning (ICL) capabilities. In this study, we delve into the ICL capabilities of LLMs on …