[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

Conversational agents in therapeutic interventions for neurodevelopmental disorders: a survey

F Catania, M Spitale, F Garzotto - ACM Computing Surveys, 2023 - dl.acm.org
Neurodevelopmental Disorders (NDD) are a group of conditions with onset in the
developmental period characterized by deficits in the cognitive and social areas …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

W Fedus, B Zoph, N Shazeer - Journal of Machine Learning Research, 2022 - jmlr.org
In deep learning, models typically reuse the same parameters for all inputs. Mixture of
Experts (MoE) models defy this and instead select different parameters for each incoming …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

Nyströmformer: A nyström-based algorithm for approximating self-attention

Y Xiong, Z Zeng, R Chakraborty, M Tan… - Proceedings of the …, 2021 - ojs.aaai.org
Transformers have emerged as a powerful tool for a broad range of natural language
processing tasks. A key component that drives the impressive performance of Transformers …

Memorizing transformers

Y Wu, MN Rabe, DL Hutchins, C Szegedy - arXiv preprint arXiv …, 2022 - arxiv.org
Language models typically need to be trained or finetuned in order to acquire new
knowledge, which involves updating their weights. We instead envision language models …

Transformers are rnns: Fast autoregressive transformers with linear attention

A Katharopoulos, A Vyas, N Pappas… - … on machine learning, 2020 - proceedings.mlr.press
Transformers achieve remarkable performance in several tasks but due to their quadratic
complexity, with respect to the input's length, they are prohibitively slow for very long …

Generating radiology reports via memory-driven transformer

Z Chen, Y Song, TH Chang, X Wan - arXiv preprint arXiv:2010.16056, 2020 - arxiv.org
Medical imaging is frequently used in clinical practice and trials for diagnosis and treatment.
Writing imaging reports is time-consuming and can be error-prone for inexperienced …