Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Survey of vulnerabilities in large language models revealed by adversarial attacks

E Shayegani, MAA Mamun, Y Fu, P Zaree… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as
they integrate more deeply into complex systems, the urgency to scrutinize their security …

Weak-to-strong generalization: Eliciting strong capabilities with weak supervision

C Burns, P Izmailov, JH Kirchner, B Baker… - arXiv preprint arXiv …, 2023 - arxiv.org
Widely used alignment techniques, such as reinforcement learning from human feedback
(RLHF), rely on the ability of humans to supervise model behavior-for example, to evaluate …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Frontier AI regulation: Managing emerging risks to public safety

M Anderljung, J Barnhart, A Korinek, J Leung… - arXiv preprint arXiv …, 2023 - arxiv.org
Advanced AI models hold the promise of tremendous benefits for humanity, but society
needs to proactively manage the accompanying risks. In this paper, we focus on what we …

Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks

Z Wu, L Qiu, A Ross, E Akyürek, B Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
The impressive performance of recent language models across a wide range of tasks
suggests that they possess a degree of abstract reasoning skills. Are these skills general …

Do llms exhibit human-like response biases? a case study in survey design

L Tjuatja, V Chen, T Wu, A Talwalkwar… - Transactions of the …, 2024 - direct.mit.edu
One widely cited barrier to the adoption of LLMs as proxies for humans in subjective tasks is
their sensitivity to prompt wording—but interestingly, humans also display sensitivities to …

[PDF][PDF] Machine psychology: Investigating emergent capabilities and behavior in large language models using psychological methods

T Hagendorff - arXiv preprint arXiv:2303.13988, 2023 - cybershafarat.com
Large language models (LLMs) are currently at the forefront of intertwining AI systems with
human communication and everyday life. Due to rapid technological advances and their …

Embers of autoregression: Understanding large language models through the problem they are trained to solve

RT McCoy, S Yao, D Friedman, M Hardy… - arXiv preprint arXiv …, 2023 - arxiv.org
The widespread adoption of large language models (LLMs) makes it important to recognize
their strengths and limitations. We argue that in order to develop a holistic understanding of …

Moca: Measuring human-language model alignment on causal and moral judgment tasks

A Nie, Y Zhang, AS Amdekar, C Piech… - Advances in …, 2023 - proceedings.neurips.cc
Human commonsense understanding of the physical and social world is organized around
intuitive theories. These theories support making causal and moral judgments. When …