Designing neural networks through neuroevolution

KO Stanley, J Clune, J Lehman… - Nature Machine …, 2019 - nature.com
Much of recent machine learning has focused on deep learning, in which neural network
weights are trained through variants of stochastic gradient descent. An alternative approach …

Artificial neural networks for neuroscientists: a primer

GR Yang, XJ Wang - Neuron, 2020 - cell.com
Artificial neural networks (ANNs) are essential tools in machine learning that have drawn
increasing attention in neuroscience. Besides offering powerful techniques for data analysis …

Meta-learning in neural networks: A survey

T Hospedales, A Antoniou, P Micaelli… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
The field of meta-learning, or learning-to-learn, has seen a dramatic rise in interest in recent
years. Contrary to conventional approaches to AI where tasks are solved from scratch using …

A survey of deep meta-learning

M Huisman, JN Van Rijn, A Plaat - Artificial Intelligence Review, 2021 - Springer
Deep neural networks can achieve great successes when presented with large data sets
and sufficient computational resources. However, their ability to learn new concepts quickly …

Frozen pretrained transformers as universal computation engines

K Lu, A Grover, P Abbeel, I Mordatch - Proceedings of the AAAI …, 2022 - ojs.aaai.org
We investigate the capability of a transformer pretrained on natural language to generalize
to other modalities with minimal finetuning--in particular, without finetuning of the self …

A survey of meta-reinforcement learning

J Beck, R Vuorio, EZ Liu, Z Xiong, L Zintgraf… - arXiv preprint arXiv …, 2023 - arxiv.org
While deep reinforcement learning (RL) has fueled multiple high-profile successes in
machine learning, it is held back from more widespread adoption by its often poor data …

Random feature attention

H Peng, N Pappas, D Yogatama, R Schwartz… - arXiv preprint arXiv …, 2021 - arxiv.org
Transformers are state-of-the-art models for a variety of sequence modeling tasks. At their
core is an attention function which models pairwise interactions between the inputs at every …

Linear transformers are secretly fast weight programmers

I Schlag, K Irie, J Schmidhuber - International Conference on …, 2021 - proceedings.mlr.press
We show the formal equivalence of linearised self-attention mechanisms and fast weight
controllers from the early'90s, where a slow neural net learns by gradient descent to …

Stabilizing transformers for reinforcement learning

E Parisotto, F Song, J Rae, R Pascanu… - International …, 2020 - proceedings.mlr.press
Owing to their ability to both effectively integrate information over long time horizons and
scale to massive amounts of data, self-attention architectures have recently shown …

Meta-learning with warped gradient descent

S Flennerhag, AA Rusu, R Pascanu, F Visin… - arXiv preprint arXiv …, 2019 - arxiv.org
Learning an efficient update rule from data that promotes rapid learning of new tasks from
the same distribution remains an open problem in meta-learning. Typically, previous works …