Gspmd: general and scalable parallelization for ml computation graphs

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

被引用次数：439 相关文章所有 3 个版本

Gemma 2: Improving open language models at a practical size

G Team, M Riviere, S Pathak, PG Sessa… - arXiv preprint arXiv …, 2024 - arxiv.org

In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-
of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new …

被引用次数：243 相关文章所有 3 个版本

[PDF] arxiv.org

Palm 2 technical report

R Anil, AM Dai, O Firat, M Johnson, D Lepikhin… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and
reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is …

被引用次数：1497 相关文章所有 2 个版本

[PDF] arxiv.org

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

G Team, P Georgiev, VI Lei, R Burnell, L Bai… - arXiv preprint arXiv …, 2024 - arxiv.org

In this report, we introduce the Gemini 1.5 family of models, representing the next generation
of highly compute-efficient multimodal models capable of recalling and reasoning over fine …

被引用次数：799 相关文章所有 4 个版本

[PDF] arxiv.org

Gemma: Open models based on gemini research and technology

G Team, T Mesnard, C Hardin, R Dadashi… - arXiv preprint arXiv …, 2024 - arxiv.org

This work introduces Gemma, a family of lightweight, state-of-the art open models built from
the research and technology used to create Gemini models. Gemma models demonstrate …

被引用次数：809 相关文章所有 2 个版本

[PDF] acm.org

Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings

N Jouppi, G Kurian, S Li, P Ma, R Nagarajan… - Proceedings of the 50th …, 2023 - dl.acm.org

In response to innovations in machine learning (ML) models, production workloads changed
radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its …

被引用次数：288 相关文章所有 6 个版本

[PDF] mlsys.org

Efficiently scaling transformer inference

R Pope, S Douglas, A Chowdhery… - Proceedings of …, 2023 - proceedings.mlsys.org

We study the problem of efficient generative inference for Transformer models, in one of its
most challenging settings: large deep models, with tight latency targets and long sequence …

被引用次数：320 相关文章所有 4 个版本

[PDF] 3dvar.com

[PDF][PDF] Scaling autoregressive models for content-rich text-to-image generation

J Yu, Y Xu, JY Koh, T Luong, G Baid, Z Wang… - arXiv preprint arXiv …, 2022 - 3dvar.com

Abstract We present the Pathways [1] Autoregressive Text-to-Image (Parti) model, which
generates high-fidelity photorealistic images and supports content-rich synthesis involving …

被引用次数：1055 相关文章所有 5 个版本

[PDF] arxiv.org

Coca: Contrastive captioners are image-text foundation models

J Yu, Z Wang, V Vasudevan, L Yeung… - arXiv preprint arXiv …, 2022 - arxiv.org

Exploring large-scale pretrained foundation models is of significant interest in computer
vision because these models can be quickly transferred to many downstream tasks. This …

被引用次数：1374 相关文章所有 7 个版本

[PDF] jmlr.org

Palm: Scaling language modeling with pathways

A Chowdhery, S Narang, J Devlin, M Bosma… - Journal of Machine …, 2023 - jmlr.org

Large language models have been shown to achieve remarkable performance across a
variety of natural language tasks using few-shot learning, which drastically reduces the …

被引用次数：5307 相关文章所有 10 个版本