[PDF][PDF] Linear algorithms for online multitask classification
G Cavallanti, N Cesa-Bianchi, C Gentile - The Journal of Machine Learning …, 2010 - jmlr.org
We introduce new Perceptron-based algorithms for the online multitask binary classification
problem. Under suitable regularity conditions, our algorithms are shown to improve on their …
problem. Under suitable regularity conditions, our algorithms are shown to improve on their …
Reparameterizing mirror descent as gradient descent
E Amid, MKK Warmuth - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Most of the recent successful applications of neural networks have been based on training
with gradient descent updates. However, for some small networks, other mirror descent …
with gradient descent updates. However, for some small networks, other mirror descent …
Open Problem: Learning sparse linear concepts by priming the features
MK Warmuth, E Amid - The Thirty Sixth Annual Conference …, 2023 - proceedings.mlr.press
Sparse linear problems can be learned well with online multiplicative updates. The question
is weather there are closed form updates based on the past examples that can sample …
is weather there are closed form updates based on the past examples that can sample …
A unifying view of representer theorems
A Argyriou, F Dinuzzo - International Conference on …, 2014 - proceedings.mlr.press
It is known that the solution of regularization and interpolation problems with Hilbertian
penalties can be expressed as a linear combination of the data. This very useful property …
penalties can be expressed as a linear combination of the data. This very useful property …
Online matrix completion with side information
We give an online algorithm and prove novel mistake and regret bounds for online binary
matrix completion with side information. The mistake bounds we prove are of the form\tilde …
matrix completion with side information. The mistake bounds we prove are of the form\tilde …
Characterizing the representer theorem
The representer theorem assures that kernel methods retain optimality under penalized
empirical risk minimization. While a sufficient condition on the form of the regularizer …
empirical risk minimization. While a sufficient condition on the form of the regularizer …
The limits of squared Euclidean distance regularization
M Derezinski, MKK Warmuth - Advances in Neural …, 2014 - proceedings.neurips.cc
Some of the simplest loss functions considered in Machine Learning are the square loss, the
logistic loss and the hinge loss. The most common family of algorithms, including Gradient …
logistic loss and the hinge loss. The most common family of algorithms, including Gradient …
A case where a spindly two-layer linear network decisively outperforms any neural network with a fully connected input layer
MK Warmuth, W Kotłowski… - Algorithmic Learning …, 2021 - proceedings.mlr.press
It was conjectured that any neural network of any structure and arbitrary differentiable
transfer functions at the nodes cannot learn the following problem sample efficiently when …
transfer functions at the nodes cannot learn the following problem sample efficiently when …
Online Matrix Completion with Side Information
FYL Tse - 2023 - discovery.ucl.ac.uk
This thesis considers the problem of binary matrix completion with side information in the
online setting and the applications thereof. The side information provides additional …
online setting and the applications thereof. The side information provides additional …
[图书][B] Tempered Bregman Divergence for Continuous and Discrete Time Mirror Descent and Robust Classification
E Amid - 2020 - search.proquest.com
Bregman divergence is an important class of divergence functions in Machine Learning.
Many well-known updates including gradient descent and (un) normalized exponentiated …
Many well-known updates including gradient descent and (un) normalized exponentiated …