[PDF][PDF] Linear algorithms for online multitask classification

G Cavallanti, N Cesa-Bianchi, C Gentile - The Journal of Machine Learning …, 2010 - jmlr.org
We introduce new Perceptron-based algorithms for the online multitask binary classification
problem. Under suitable regularity conditions, our algorithms are shown to improve on their …

Reparameterizing mirror descent as gradient descent

E Amid, MKK Warmuth - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Most of the recent successful applications of neural networks have been based on training
with gradient descent updates. However, for some small networks, other mirror descent …

Open Problem: Learning sparse linear concepts by priming the features

MK Warmuth, E Amid - The Thirty Sixth Annual Conference …, 2023 - proceedings.mlr.press
Sparse linear problems can be learned well with online multiplicative updates. The question
is weather there are closed form updates based on the past examples that can sample …

A unifying view of representer theorems

A Argyriou, F Dinuzzo - International Conference on …, 2014 - proceedings.mlr.press
It is known that the solution of regularization and interpolation problems with Hilbertian
penalties can be expressed as a linear combination of the data. This very useful property …

Online matrix completion with side information

M Herbster, S Pasteris, L Tse - Advances in Neural …, 2020 - proceedings.neurips.cc
We give an online algorithm and prove novel mistake and regret bounds for online binary
matrix completion with side information. The mistake bounds we prove are of the form\tilde …

Characterizing the representer theorem

Y Yu, H Cheng, D Schuurmans… - … on machine learning, 2013 - proceedings.mlr.press
The representer theorem assures that kernel methods retain optimality under penalized
empirical risk minimization. While a sufficient condition on the form of the regularizer …

The limits of squared Euclidean distance regularization

M Derezinski, MKK Warmuth - Advances in Neural …, 2014 - proceedings.neurips.cc
Some of the simplest loss functions considered in Machine Learning are the square loss, the
logistic loss and the hinge loss. The most common family of algorithms, including Gradient …

A case where a spindly two-layer linear network decisively outperforms any neural network with a fully connected input layer

MK Warmuth, W Kotłowski… - Algorithmic Learning …, 2021 - proceedings.mlr.press
It was conjectured that any neural network of any structure and arbitrary differentiable
transfer functions at the nodes cannot learn the following problem sample efficiently when …

Online Matrix Completion with Side Information

FYL Tse - 2023 - discovery.ucl.ac.uk
This thesis considers the problem of binary matrix completion with side information in the
online setting and the applications thereof. The side information provides additional …

[图书][B] Tempered Bregman Divergence for Continuous and Discrete Time Mirror Descent and Robust Classification

E Amid - 2020 - search.proquest.com
Bregman divergence is an important class of divergence functions in Machine Learning.
Many well-known updates including gradient descent and (un) normalized exponentiated …