Statistical learning theory for control: A finite-sample perspective

A Tsiamis, I Ziemann, N Matni… - IEEE Control Systems …, 2023 - ieeexplore.ieee.org
Learning algorithms have become an integral component to modern engineering solutions.
Examples range from self-driving cars and recommender systems to finance and even …

Transformers as algorithms: Generalization and stability in in-context learning

Y Li, ME Ildiz, D Papailiopoulos… - … on Machine Learning, 2023 - proceedings.mlr.press
In-context learning (ICL) is a type of prompting where a transformer model operates on a
sequence of (input, output) examples and performs inference on-the-fly. In this work, we …

A tutorial on the non-asymptotic theory of system identification

I Ziemann, A Tsiamis, B Lee, Y Jedra… - 2023 62nd IEEE …, 2023 - ieeexplore.ieee.org
This tutorial serves as an introduction to recently developed non-asymptotic methods in the
theory of-mainly linear-system identification. We emphasize tools we deem particularly …

Learning from many trajectories

S Tu, R Frostig, M Soltanolkotabi - Journal of Machine Learning Research, 2024 - jmlr.org
We initiate a study of supervised learning from many independent sequences (" trajectories")
of non-independent covariates, reflecting tasks in sequence modeling, control, and …

Optimistic active exploration of dynamical systems

L Treven, C Sancaktar, S Blaes… - Advances in Neural …, 2023 - proceedings.neurips.cc
Reinforcement learning algorithms commonly seek to optimize policies for solving one
particular task. How should we explore an unknown dynamical system such that the …

Sharp rates in dependent learning theory: Avoiding sample size deflation for the square loss

I Ziemann, S Tu, GJ Pappas, N Matni - arXiv preprint arXiv:2402.05928, 2024 - arxiv.org
In this work, we study statistical learning with dependent ($\beta $-mixing) data and square
loss in a hypothesis class $\mathscr {F}\subset L_ {\Psi_p} $ where $\Psi_p $ is the norm …

PAC-Bayes Generalisation Bounds for Dynamical Systems Including Stable RNNs

D Eringis, J Leth, ZH Tan, R Wisniewski… - Proceedings of the …, 2024 - ojs.aaai.org
In this paper, we derive a PAC-Bayes bound on the generalisation gap, in a supervised time-
series setting for a special class of discrete-time non-linear dynamical systems. This class …

Streaming PCA for Markovian data

S Kumar, P Sarkar - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Since its inception in 1982, Oja's algorithm has become an established method for
streaming principle component analysis (PCA). We study the problem of streaming PCA …

The noise level in linear regression with dependent data

I Ziemann, S Tu, GJ Pappas… - Advances in Neural …, 2024 - proceedings.neurips.cc
We derive upper bounds for random design linear regression with dependent ($\beta $-
mixing) data absent any realizability assumptions. In contrast to the strictly realizable …

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

ME Ildiz, Y Huang, Y Li, AS Rawat, S Oymak - arXiv preprint arXiv …, 2024 - arxiv.org
Modern language models rely on the transformer architecture and attention mechanism to
perform language understanding and text generation. In this work, we study learning a 1 …