Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Gemma 2: Improving open language models at a practical size

G Team, M Riviere, S Pathak, PG Sessa… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-
of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new …

Palm 2 technical report

R Anil, AM Dai, O Firat, M Johnson, D Lepikhin… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and
reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is …

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

G Team, P Georgiev, VI Lei, R Burnell, L Bai… - arXiv preprint arXiv …, 2024 - arxiv.org
In this report, we introduce the Gemini 1.5 family of models, representing the next generation
of highly compute-efficient multimodal models capable of recalling and reasoning over fine …

Gemma: Open models based on gemini research and technology

G Team, T Mesnard, C Hardin, R Dadashi… - arXiv preprint arXiv …, 2024 - arxiv.org
This work introduces Gemma, a family of lightweight, state-of-the art open models built from
the research and technology used to create Gemini models. Gemma models demonstrate …

Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings

N Jouppi, G Kurian, S Li, P Ma, R Nagarajan… - Proceedings of the 50th …, 2023 - dl.acm.org
In response to innovations in machine learning (ML) models, production workloads changed
radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its …

Efficiently scaling transformer inference

R Pope, S Douglas, A Chowdhery… - Proceedings of …, 2023 - proceedings.mlsys.org
We study the problem of efficient generative inference for Transformer models, in one of its
most challenging settings: large deep models, with tight latency targets and long sequence …

[PDF][PDF] Scaling autoregressive models for content-rich text-to-image generation

J Yu, Y Xu, JY Koh, T Luong, G Baid, Z Wang… - arXiv preprint arXiv …, 2022 - 3dvar.com
Abstract We present the Pathways [1] Autoregressive Text-to-Image (Parti) model, which
generates high-fidelity photorealistic images and supports content-rich synthesis involving …

Coca: Contrastive captioners are image-text foundation models

J Yu, Z Wang, V Vasudevan, L Yeung… - arXiv preprint arXiv …, 2022 - arxiv.org
Exploring large-scale pretrained foundation models is of significant interest in computer
vision because these models can be quickly transferred to many downstream tasks. This …

Palm: Scaling language modeling with pathways

A Chowdhery, S Narang, J Devlin, M Bosma… - Journal of Machine …, 2023 - jmlr.org
Large language models have been shown to achieve remarkable performance across a
variety of natural language tasks using few-shot learning, which drastically reduces the …