One fits all: Power general time series analysis by pretrained lm
Although we have witnessed great success of pre-trained models in natural language
processing (NLP) and computer vision (CV), limited progress has been made for general …
processing (NLP) and computer vision (CV), limited progress has been made for general …
Explainability for large language models: A survey
Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …
language processing. However, their internal mechanisms are still unclear and this lack of …
Weak-to-strong generalization: Eliciting strong capabilities with weak supervision
Widely used alignment techniques, such as reinforcement learning from human feedback
(RLHF), rely on the ability of humans to supervise model behavior-for example, to evaluate …
(RLHF), rely on the ability of humans to supervise model behavior-for example, to evaluate …
How do transformers learn topic structure: Towards a mechanistic understanding
While the successes of transformers across many domains are indisputable, accurate
understanding of the learning mechanics is still largely lacking. Their capabilities have been …
understanding of the learning mechanics is still largely lacking. Their capabilities have been …
Consciousness in artificial intelligence: insights from the science of consciousness
Whether current or near-term AI systems could be conscious is a topic of scientific interest
and increasing public concern. This report argues for, and exemplifies, a rigorous and …
and increasing public concern. This report argues for, and exemplifies, a rigorous and …
Inductive biases and variable creation in self-attention mechanisms
Self-attention, an architectural motif designed to model long-range interactions in sequential
data, has driven numerous recent breakthroughs in natural language processing and …
data, has driven numerous recent breakthroughs in natural language processing and …
Attentionviz: A global view of transformer attention
Transformer models are revolutionizing machine learning, but their inner workings remain
mysterious. In this work, we present a new visualization technique designed to help …
mysterious. In this work, we present a new visualization technique designed to help …
A mechanistic understanding of alignment algorithms: A case study on dpo and toxicity
While alignment algorithms are now commonly used to tune pre-trained language models
towards a user's preferences, we lack explanations for the underlying mechanisms in which …
towards a user's preferences, we lack explanations for the underlying mechanisms in which …
Moe-mamba: Efficient selective state space models with mixture of experts
State Space Models (SSMs) have become serious contenders in the field of sequential
modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts …
modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts …
Scaling laws and interpretability of learning from repeated data
D Hernandez, T Brown, T Conerly, N DasSarma… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent large language models have been trained on vast datasets, but also often on
repeated data, either intentionally for the purpose of upweighting higher quality data, or …
repeated data, either intentionally for the purpose of upweighting higher quality data, or …