Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review
The paper reviews and extends an emerging body of theoretical results on deep learning
including the conditions under which it can be exponentially better than shallow learning. A …
including the conditions under which it can be exponentially better than shallow learning. A …
Tensor networks for dimensionality reduction and large-scale optimization: Part 2 applications and future perspectives
Part 2 of this monograph builds on the introduction to tensor networks and their operations
presented in Part 1. It focuses on tensor network models for super-compressed higher-order …
presented in Part 1. It focuses on tensor network models for super-compressed higher-order …
On the expressive power of deep learning: A tensor analysis
It has long been conjectured that hypotheses spaces suitable for data that is compositional
in nature, such as text or images, may be more efficiently represented with deep hierarchical …
in nature, such as text or images, may be more efficiently represented with deep hierarchical …
Simple recurrent units for highly parallelizable recurrence
Common recurrent neural architectures scale poorly due to the intrinsic difficulty in
parallelizing their state computations. In this work, we propose the Simple Recurrent Unit …
parallelizing their state computations. In this work, we propose the Simple Recurrent Unit …
Theoretical issues in deep networks
While deep learning is successful in a number of applications, it is not yet well understood
theoretically. A theoretical characterization of deep learning should answer questions about …
theoretically. A theoretical characterization of deep learning should answer questions about …
Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity
We develop a general duality between neural networks and compositional kernel Hilbert
spaces. We introduce the notion of a computation skeleton, an acyclic graph that succinctly …
spaces. We introduce the notion of a computation skeleton, an acyclic graph that succinctly …
SGD learns the conjugate kernel class of the network
A Daniely - Advances in neural information processing …, 2017 - proceedings.neurips.cc
We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to
learn, in polynomial time, a function that is competitive with the best function in the conjugate …
learn, in polynomial time, a function that is competitive with the best function in the conjugate …
Deriving neural architectures from sequence and graph kernels
The design of neural architectures for structured objects is typically guided by experimental
insights rather than a formal process. In this work, we appeal to kernels over combinatorial …
insights rather than a formal process. In this work, we appeal to kernels over combinatorial …
Deep randomized neural networks
C Gallicchio, S Scardapane - Recent Trends in Learning From Data …, 2020 - Springer
Abstract Randomized Neural Networks explore the behavior of neural systems where the
majority of connections are fixed, either in a stochastic or a deterministic fashion. Typical …
majority of connections are fixed, either in a stochastic or a deterministic fashion. Typical …