Pythia: A suite for analyzing large language models across training and scaling

S Biderman, H Schoelkopf… - International …, 2023 - proceedings.mlr.press
How do large language models (LLMs) develop and evolve over the course of training?
How do these patterns change as models scale? To answer these questions, we introduce …

Smoothquant: Accurate and efficient post-training quantization for large language models

G Xiao, J Lin, M Seznec, H Wu… - International …, 2023 - proceedings.mlr.press
Large language models (LLMs) show excellent performance but are compute-and memory-
intensive. Quantization can reduce memory and accelerate inference. However, existing …

Detectgpt: Zero-shot machine-generated text detection using probability curvature

E Mitchell, Y Lee, A Khazatsky… - International …, 2023 - proceedings.mlr.press
The increasing fluency and widespread usage of large language models (LLMs) highlight
the desirability of corresponding tools aiding detection of LLM-generated text. In this paper …

Academic Integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond

M Perkins - Journal of University Teaching and Learning …, 2023 - open-publishing.org
This paper explores the academic integrity considerations of students' use of Artificial
Intelligence (AI) tools using Large Language Models (LLMs) such as ChatGPT in formal …

Can large language models reason about medical questions?

V Liévin, CE Hother, AG Motzfeldt, O Winther - Patterns, 2024 - cell.com
Although large language models often produce impressive outputs, it remains unclear how
they perform in real-world scenarios requiring strong reasoning skills and expert domain …

Synthetic prompting: Generating chain-of-thought demonstrations for large language models

Z Shao, Y Gong, Y Shen, M Huang… - International …, 2023 - proceedings.mlr.press
Large language models can perform various reasoning tasks by using chain-of-thought
prompting, which guides them to find answers through step-by-step demonstrations …

EvoPrompting: language models for code-level neural architecture search

A Chen, D Dohan, D So - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Given the recent impressive accomplishments of language models (LMs) for code
generation, we explore the use of LMs as general adaptive mutation and crossover …

Cerebras-gpt: Open compute-optimal language models trained on the cerebras wafer-scale cluster

N Dey, G Gosal, H Khachane, W Marshall… - arXiv preprint arXiv …, 2023 - arxiv.org
We study recent research advances that improve large language models through efficient
pre-training and scaling, and open datasets and tools. We combine these advances to …

No train no gain: Revisiting efficient training algorithms for transformer-based language models

J Kaddour, O Key, P Nawrot… - Advances in Neural …, 2024 - proceedings.neurips.cc
The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient training …

Mm-vid: Advancing video understanding with gpt-4v (ision)

K Lin, F Ahmed, L Li, CC Lin, E Azarnasab… - arXiv preprint arXiv …, 2023 - arxiv.org
We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V,
combined with specialized tools in vision, audio, and speech, to facilitate advanced video …