Kernelization of matrix updates, when and how?

G Cavallanti, N Cesa-Bianchi, C Gentile - The Journal of Machine Learning …, 2010 - jmlr.org

We introduce new Perceptron-based algorithms for the online multitask binary classification
problem. Under suitable regularity conditions, our algorithms are shown to improve on their …

被引用次数：209 相关文章所有 26 个版本

[PDF] neurips.cc

Reparameterizing mirror descent as gradient descent

E Amid, MKK Warmuth - Advances in Neural Information …, 2020 - proceedings.neurips.cc

Most of the recent successful applications of neural networks have been based on training
with gradient descent updates. However, for some small networks, other mirror descent …

被引用次数：42 相关文章所有 6 个版本

[PDF] mlr.press

Open Problem: Learning sparse linear concepts by priming the features

MK Warmuth, E Amid - The Thirty Sixth Annual Conference …, 2023 - proceedings.mlr.press

Sparse linear problems can be learned well with online multiplicative updates. The question
is weather there are closed form updates based on the past examples that can sample …

被引用次数：1 相关文章

[PDF] mlr.press

A unifying view of representer theorems

A Argyriou, F Dinuzzo - International Conference on …, 2014 - proceedings.mlr.press

It is known that the solution of regularization and interpolation problems with Hilbertian
penalties can be expressed as a linear combination of the data. This very useful property …

被引用次数：32 相关文章所有 11 个版本

[PDF] neurips.cc

Online matrix completion with side information

M Herbster, S Pasteris, L Tse - Advances in Neural …, 2020 - proceedings.neurips.cc

We give an online algorithm and prove novel mistake and regret bounds for online binary
matrix completion with side information. The mistake bounds we prove are of the form\tilde …

被引用次数：17 相关文章所有 7 个版本

[PDF] mlr.press

Characterizing the representer theorem

Y Yu, H Cheng, D Schuurmans… - … on machine learning, 2013 - proceedings.mlr.press

The representer theorem assures that kernel methods retain optimality under penalized
empirical risk minimization. While a sufficient condition on the form of the regularizer …

被引用次数：22 相关文章所有 21 个版本

[PDF] neurips.cc

The limits of squared Euclidean distance regularization

M Derezinski, MKK Warmuth - Advances in Neural …, 2014 - proceedings.neurips.cc

Some of the simplest loss functions considered in Machine Learning are the square loss, the
logistic loss and the hinge loss. The most common family of algorithms, including Gradient …

被引用次数：10 相关文章所有 6 个版本

[PDF] mlr.press

A case where a spindly two-layer linear network decisively outperforms any neural network with a fully connected input layer

MK Warmuth, W Kotłowski… - Algorithmic Learning …, 2021 - proceedings.mlr.press

It was conjectured that any neural network of any structure and arbitrary differentiable
transfer functions at the nodes cannot learn the following problem sample efficiently when …

被引用次数：1 相关文章所有 3 个版本

[PDF] ucl.ac.uk

Online Matrix Completion with Side Information

FYL Tse - 2023 - discovery.ucl.ac.uk

This thesis considers the problem of binary matrix completion with side information in the
online setting and the applications thereof. The side information provides additional …

[图书][B] Tempered Bregman Divergence for Continuous and Discrete Time Mirror Descent and Robust Classification

E Amid - 2020 - search.proquest.com

Bregman divergence is an important class of divergence functions in Machine Learning.
Many well-known updates including gradient descent and (un) normalized exponentiated …

被引用次数：2 相关文章所有 3 个版本