Opt: Open pre-trained transformer language models, 2022

S Biderman, H Schoelkopf… - International …, 2023 - proceedings.mlr.press

How do large language models (LLMs) develop and evolve over the course of training?
How do these patterns change as models scale? To answer these questions, we introduce …

被引用次数：692 相关文章所有 7 个版本

[PDF] mlr.press

Smoothquant: Accurate and efficient post-training quantization for large language models

G Xiao, J Lin, M Seznec, H Wu… - International …, 2023 - proceedings.mlr.press

Large language models (LLMs) show excellent performance but are compute-and memory-
intensive. Quantization can reduce memory and accelerate inference. However, existing …

被引用次数：518 相关文章所有 7 个版本

[PDF] mlr.press

Detectgpt: Zero-shot machine-generated text detection using probability curvature

E Mitchell, Y Lee, A Khazatsky… - International …, 2023 - proceedings.mlr.press

The increasing fluency and widespread usage of large language models (LLMs) highlight
the desirability of corresponding tools aiding detection of LLM-generated text. In this paper …

被引用次数：420 相关文章所有 6 个版本

[PDF] open-publishing.org

Academic Integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond

M Perkins - Journal of University Teaching and Learning …, 2023 - open-publishing.org

This paper explores the academic integrity considerations of students' use of Artificial
Intelligence (AI) tools using Large Language Models (LLMs) such as ChatGPT in formal …

被引用次数：430 相关文章所有 5 个版本

[PDF] cell.com Full View

Can large language models reason about medical questions?

V Liévin, CE Hother, AG Motzfeldt, O Winther - Patterns, 2024 - cell.com

Although large language models often produce impressive outputs, it remains unclear how
they perform in real-world scenarios requiring strong reasoning skills and expert domain …

被引用次数：215 相关文章所有 9 个版本

[PDF] mlr.press

Synthetic prompting: Generating chain-of-thought demonstrations for large language models

Z Shao, Y Gong, Y Shen, M Huang… - International …, 2023 - proceedings.mlr.press

Large language models can perform various reasoning tasks by using chain-of-thought
prompting, which guides them to find answers through step-by-step demonstrations …

被引用次数：75 相关文章所有 6 个版本

[PDF] neurips.cc

EvoPrompting: language models for code-level neural architecture search

A Chen, D Dohan, D So - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Given the recent impressive accomplishments of language models (LMs) for code
generation, we explore the use of LMs as general adaptive mutation and crossover …

被引用次数：51 相关文章所有 5 个版本

[PDF] arxiv.org

Cerebras-gpt: Open compute-optimal language models trained on the cerebras wafer-scale cluster

N Dey, G Gosal, H Khachane, W Marshall… - arXiv preprint arXiv …, 2023 - arxiv.org

We study recent research advances that improve large language models through efficient
pre-training and scaling, and open datasets and tools. We combine these advances to …

被引用次数：58 相关文章所有 2 个版本

[PDF] neurips.cc

No train no gain: Revisiting efficient training algorithms for transformer-based language models

J Kaddour, O Key, P Nawrot… - Advances in Neural …, 2024 - proceedings.neurips.cc

The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient training …

被引用次数：18 相关文章所有 5 个版本

[PDF] arxiv.org

Mm-vid: Advancing video understanding with gpt-4v (ision)

K Lin, F Ahmed, L Li, CC Lin, E Azarnasab… - arXiv preprint arXiv …, 2023 - arxiv.org

We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V,
combined with specialized tools in vision, audio, and speech, to facilitate advanced video …

被引用次数：35 相关文章所有 2 个版本