When large language models meet personalization: Perspectives of challenges and opportunities

J Chen, Z Liu, X Huang, C Wu, Q Liu, G Jiang, Y Pu… - World Wide Web, 2024 - Springer
The advent of large language models marks a revolutionary breakthrough in artificial
intelligence. With the unprecedented scale of training and model parameters, the capability …

Large language models and causal inference in collaboration: A comprehensive survey

X Liu, P Xu, J Wu, J Yuan, Y Yang, Y Zhou, F Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Causal inference has shown potential in enhancing the predictive accuracy, fairness,
robustness, and explainability of Natural Language Processing (NLP) models by capturing …

Towards automated circuit discovery for mechanistic interpretability

A Conmy, A Mavor-Parker, A Lynch… - Advances in …, 2023 - proceedings.neurips.cc
Through considerable effort and intuition, several recent works have reverse-engineered
nontrivial behaviors oftransformer models. This paper systematizes the mechanistic …

Leace: Perfect linear concept erasure in closed form

N Belrose, D Schneider-Joseph… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Concept erasure aims to remove specified features from a representation. It can
improve fairness (eg preventing a classifier from using gender or race) and interpretability …

Radiology-llama2: Best-in-class large language model for radiology

Z Liu, Y Li, P Shu, A Zhong, L Yang, C Ju, Z Wu… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper introduces Radiology-Llama2, a large language model specialized for radiology
through a process known as instruction tuning. Radiology-Llama2 is based on the Llama2 …

Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla

T Lieberum, M Rahtz, J Kramár, N Nanda… - arXiv preprint arXiv …, 2023 - arxiv.org
\emph {Circuit analysis} is a promising technique for understanding the internal mechanisms
of language models. However, existing analyses are done in small models far from the state …

A survey on interpretable reinforcement learning

C Glanois, P Weng, M Zimmer, D Li, T Yang, J Hao… - Machine Learning, 2024 - Springer
Although deep reinforcement learning has become a promising machine learning approach
for sequential decision-making problems, it is still not mature enough for high-stake domains …

Rethinking interpretability in the era of large language models

C Singh, JP Inala, M Galley, R Caruana… - arXiv preprint arXiv …, 2024 - arxiv.org
Interpretable machine learning has exploded as an area of interest over the last decade,
sparked by the rise of increasingly large datasets and deep neural networks …

Towards best practices of activation patching in language models: Metrics and methods

F Zhang, N Nanda - arXiv preprint arXiv:2309.16042, 2023 - arxiv.org
Mechanistic interpretability seeks to understand the internal mechanisms of machine
learning models, where localization--identifying the important model components--is a key …

Evaluating language models for mathematics through interactions

KM Collins, AQ Jiang, S Frieder… - Proceedings of the …, 2024 - National Acad Sciences
There is much excitement about the opportunity to harness the power of large language
models (LLMs) when building problem-solving assistants. However, the standard …