Large memory layers with product keys

[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier

Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

被引用次数：1169 相关文章所有 4 个版本

[PDF] acm.org

Conversational agents in therapeutic interventions for neurodevelopmental disorders: a survey

F Catania, M Spitale, F Garzotto - ACM Computing Surveys, 2023 - dl.acm.org

Neurodevelopmental Disorders (NDD) are a group of conditions with onset in the
developmental period characterized by deficits in the cognitive and social areas …

被引用次数：1248 相关文章所有 10 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：3757 相关文章所有 2 个版本

[PDF] jmlr.org

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

W Fedus, B Zoph, N Shazeer - Journal of Machine Learning Research, 2022 - jmlr.org

In deep learning, models typically reuse the same parameters for all inputs. Mixture of
Experts (MoE) models defy this and instead select different parameters for each incoming …

被引用次数：1648 相关文章所有 4 个版本

[PDF] jmlr.org

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org

The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

被引用次数：736 相关文章所有 27 个版本

[HTML] sciencedirect.com

[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

被引用次数：757 相关文章所有 9 个版本

[PDF] aaai.org

Nyströmformer: A nyström-based algorithm for approximating self-attention

Y Xiong, Z Zeng, R Chakraborty, M Tan… - Proceedings of the …, 2021 - ojs.aaai.org

Transformers have emerged as a powerful tool for a broad range of natural language
processing tasks. A key component that drives the impressive performance of Transformers …

被引用次数：425 相关文章所有 12 个版本

[PDF] arxiv.org

Memorizing transformers

Y Wu, MN Rabe, DL Hutchins, C Szegedy - arXiv preprint arXiv …, 2022 - arxiv.org

Language models typically need to be trained or finetuned in order to acquire new
knowledge, which involves updating their weights. We instead envision language models …

被引用次数：218 相关文章所有 5 个版本

[PDF] mlr.press

Transformers are rnns: Fast autoregressive transformers with linear attention

A Katharopoulos, A Vyas, N Pappas… - … on machine learning, 2020 - proceedings.mlr.press

Transformers achieve remarkable performance in several tasks but due to their quadratic
complexity, with respect to the input's length, they are prohibitively slow for very long …

被引用次数：1457 相关文章所有 11 个版本

[PDF] arxiv.org

Generating radiology reports via memory-driven transformer

Z Chen, Y Song, TH Chang, X Wan - arXiv preprint arXiv:2010.16056, 2020 - arxiv.org

Medical imaging is frequently used in clinical practice and trials for diagnosis and treatment.
Writing imaging reports is time-consuming and can be error-prone for inexperienced …

被引用次数：444 相关文章所有 3 个版本