Data-dependent coresets for compressing neural networks with applications to generalization bounds
We present an efficient coresets-based neural network compression algorithm that sparsifies
the parameters of a trained fully-connected neural network in a manner that provably …
the parameters of a trained fully-connected neural network in a manner that provably …
The unreasonable effectiveness of structured random orthogonal embeddings
KM Choromanski, M Rowland… - Advances in neural …, 2017 - proceedings.neurips.cc
We examine a class of embeddings based on structured random matrices with orthogonal
rows which can be applied in many machine learning applications including dimensionality …
rows which can be applied in many machine learning applications including dimensionality …
On the expressive power of self-attention matrices
Transformer networks are able to capture patterns in data coming from many domains (text,
images, videos, proteins, etc.) with little or no change to architecture components. We …
images, videos, proteins, etc.) with little or no change to architecture components. We …
Sensitivity-informed provable pruning of neural networks
We introduce a family of pruning algorithms that sparsifies the parameters of a trained model
in a way that approximately preserves the model's predictive accuracy. Our algorithms use a …
in a way that approximately preserves the model's predictive accuracy. Our algorithms use a …
The geometry of random features
We present an in-depth examination of the effectiveness of radial basis function kernel
(beyond Gaussian) estimators based on orthogonal random feature maps. We show that …
(beyond Gaussian) estimators based on orthogonal random feature maps. We show that …
Recycling randomness with structure for sublinear time kernel expansions
K Choromanski, V Sindhwani - International Conference on …, 2016 - proceedings.mlr.press
We propose a scheme for recycling Gaussian random vectors into structured matrices to ap-
proximate various kernel functions in sublin-ear time via random embeddings. Our frame …
proximate various kernel functions in sublin-ear time via random embeddings. Our frame …
Structured adaptive and random spinners for fast machine learning computations
M Bojarski, A Choromanska… - Artificial intelligence …, 2017 - proceedings.mlr.press
We consider an efficient computational framework for speeding up several machine learning
algorithms with almost no loss of accuracy. The proposed framework relies on projections …
algorithms with almost no loss of accuracy. The proposed framework relies on projections …
[PDF][PDF] FROSH: FasteR Online Sketching Hashing.
Many hashing methods, especially those that are in the data-dependent category with good
learning accuracy, are still inefficient when dealing with three critical problems in modern …
learning accuracy, are still inefficient when dealing with three critical problems in modern …
Binary vectors for fast distance and similarity estimation
DA Rachkovskij - Cybernetics and Systems Analysis, 2017 - Springer
This review considers methods and algorithms for fast estimation of distance/similarity
measures between initial data from vector representations with binary or integer-valued …
measures between initial data from vector representations with binary or integer-valued …
On binary embedding using circulant matrices
Binary embeddings provide efficient and powerful ways to perform operations on large scale
data. However binary embedding typically requires long codes in order to preserve the …
data. However binary embedding typically requires long codes in order to preserve the …