Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review

T Poggio, H Mhaskar, L Rosasco, B Miranda… - International Journal of …, 2017 - Springer
The paper reviews and extends an emerging body of theoretical results on deep learning
including the conditions under which it can be exponentially better than shallow learning. A …

Tensor networks for dimensionality reduction and large-scale optimization: Part 2 applications and future perspectives

A Cichocki, AH Phan, Q Zhao, N Lee… - … and Trends® in …, 2017 - nowpublishers.com
Part 2 of this monograph builds on the introduction to tensor networks and their operations
presented in Part 1. It focuses on tensor network models for super-compressed higher-order …

On the expressive power of deep learning: A tensor analysis

N Cohen, O Sharir, A Shashua - Conference on learning …, 2016 - proceedings.mlr.press
It has long been conjectured that hypotheses spaces suitable for data that is compositional
in nature, such as text or images, may be more efficiently represented with deep hierarchical …

Simple recurrent units for highly parallelizable recurrence

T Lei, Y Zhang, SI Wang, H Dai, Y Artzi - arXiv preprint arXiv:1709.02755, 2017 - arxiv.org
Common recurrent neural architectures scale poorly due to the intrinsic difficulty in
parallelizing their state computations. In this work, we propose the Simple Recurrent Unit …

Theoretical issues in deep networks

T Poggio, A Banburski, Q Liao - Proceedings of the …, 2020 - National Acad Sciences
While deep learning is successful in a number of applications, it is not yet well understood
theoretically. A theoretical characterization of deep learning should answer questions about …

Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity

A Daniely, R Frostig, Y Singer - Advances in neural …, 2016 - proceedings.neurips.cc
We develop a general duality between neural networks and compositional kernel Hilbert
spaces. We introduce the notion of a computation skeleton, an acyclic graph that succinctly …

Training rnns as fast as cnns

T Lei, Y Zhang, Y Artzi - 2018 - openreview.net
Common recurrent neural network architectures scale poorly due to the intrinsic difficulty in
parallelizing their state computations. In this work, we propose the Simple Recurrent Unit …

SGD learns the conjugate kernel class of the network

A Daniely - Advances in neural information processing …, 2017 - proceedings.neurips.cc
We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to
learn, in polynomial time, a function that is competitive with the best function in the conjugate …

Deriving neural architectures from sequence and graph kernels

T Lei, W Jin, R Barzilay… - … Conference on Machine …, 2017 - proceedings.mlr.press
The design of neural architectures for structured objects is typically guided by experimental
insights rather than a formal process. In this work, we appeal to kernels over combinatorial …

Deep randomized neural networks

C Gallicchio, S Scardapane - Recent Trends in Learning From Data …, 2020 - Springer
Abstract Randomized Neural Networks explore the behavior of neural systems where the
majority of connections are fixed, either in a stochastic or a deterministic fashion. Typical …