One fits all: Power general time series analysis by pretrained lm

T Zhou, P Niu, L Sun, R Jin - Advances in neural …, 2023 - proceedings.neurips.cc
Although we have witnessed great success of pre-trained models in natural language
processing (NLP) and computer vision (CV), limited progress has been made for general …

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

Weak-to-strong generalization: Eliciting strong capabilities with weak supervision

C Burns, P Izmailov, JH Kirchner, B Baker… - arXiv preprint arXiv …, 2023 - arxiv.org
Widely used alignment techniques, such as reinforcement learning from human feedback
(RLHF), rely on the ability of humans to supervise model behavior-for example, to evaluate …

How do transformers learn topic structure: Towards a mechanistic understanding

Y Li, Y Li, A Risteski - International Conference on Machine …, 2023 - proceedings.mlr.press
While the successes of transformers across many domains are indisputable, accurate
understanding of the learning mechanics is still largely lacking. Their capabilities have been …

Consciousness in artificial intelligence: insights from the science of consciousness

P Butlin, R Long, E Elmoznino, Y Bengio… - arXiv preprint arXiv …, 2023 - arxiv.org
Whether current or near-term AI systems could be conscious is a topic of scientific interest
and increasing public concern. This report argues for, and exemplifies, a rigorous and …

Inductive biases and variable creation in self-attention mechanisms

BL Edelman, S Goel, S Kakade… - … on Machine Learning, 2022 - proceedings.mlr.press
Self-attention, an architectural motif designed to model long-range interactions in sequential
data, has driven numerous recent breakthroughs in natural language processing and …

Attentionviz: A global view of transformer attention

C Yeh, Y Chen, A Wu, C Chen, F Viégas… - … on Visualization and …, 2023 - ieeexplore.ieee.org
Transformer models are revolutionizing machine learning, but their inner workings remain
mysterious. In this work, we present a new visualization technique designed to help …

A mechanistic understanding of alignment algorithms: A case study on dpo and toxicity

A Lee, X Bai, I Pres, M Wattenberg… - arXiv preprint arXiv …, 2024 - arxiv.org
While alignment algorithms are now commonly used to tune pre-trained language models
towards a user's preferences, we lack explanations for the underlying mechanisms in which …

Moe-mamba: Efficient selective state space models with mixture of experts

M Pióro, K Ciebiera, K Król, J Ludziejewski… - arXiv preprint arXiv …, 2024 - arxiv.org
State Space Models (SSMs) have become serious contenders in the field of sequential
modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts …

Scaling laws and interpretability of learning from repeated data

D Hernandez, T Brown, T Conerly, N DasSarma… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent large language models have been trained on vast datasets, but also often on
repeated data, either intentionally for the purpose of upweighting higher quality data, or …