Flashattention: Fast and memory-efficient exact attention with io-awareness
Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …
complexity of self-attention are quadratic in sequence length. Approximate attention …
Simple hardware-efficient long convolutions for sequence modeling
State space models (SSMs) have high performance on long sequence modeling but require
sophisticated initialization techniques and specialized implementations for high quality and …
sophisticated initialization techniques and specialized implementations for high quality and …
Monarch: Expressive structured matrices for efficient and accurate training
Large neural networks excel in many domains, but they are expensive to train and fine-tune.
A popular approach to reduce their compute or memory requirements is to replace dense …
A popular approach to reduce their compute or memory requirements is to replace dense …
Structured transforms for small-footprint deep learning
V Sindhwani, T Sainath… - Advances in Neural …, 2015 - proceedings.neurips.cc
We consider the task of building compact deep learning pipelines suitable for deploymenton
storage and power constrained mobile devices. We propose a uni-fied framework to learn a …
storage and power constrained mobile devices. We propose a uni-fied framework to learn a …
Complexity of computations with matrices and polynomials
V Pan - SIAM review, 1992 - SIAM
MATHEMATICA (see [Dur], [CGGW], [Wol]). Page 1 SIAM REVIEW Vol. 34,No. 2, pp. 225-262,
June 1992 () 1992 Society for Industrial and Applied Mathematics 002 COMPLEXITY OF …
June 1992 () 1992 Society for Industrial and Applied Mathematics 002 COMPLEXITY OF …
[图书][B] Principles of signal detection and parameter estimation
BC Levy - 2008 - books.google.com
Asadiscipline, signaldetectionhasevolvedsigni? cantlyoverthelast40years. Some changes
have been caused by technical advances, like the development of robust detection methods …
have been caused by technical advances, like the development of robust detection methods …
A proposal for Toeplitz matrix calculations
G Strang - Studies in Applied Mathematics, 1986 - Wiley Online Library
In contrast to the usual (and successful) direct methods for Toeplitz systems Ax= b, we
propose an algorithm based on the conjugate gradient method. The preconditioner is a …
propose an algorithm based on the conjugate gradient method. The preconditioner is a …
[图书][B] Algorithms and theory of computation handbook, volume 2: special topics and techniques
MJ Atallah, M Blanton - 2009 - books.google.com
This handbook provides an up-to-date compendium of fundamental computer science
topics, techniques, and applications. Along with updating and revising many of the existing …
topics, techniques, and applications. Along with updating and revising many of the existing …
[图书][B] Iterative methods for Toeplitz systems
MK Ng - 2004 - books.google.com
Page 1 NUMERICAL MATHEMATICS AND SCIENTIFIC COMPUTATION Iterative Methods
for Toeplitz Systems MICHAEL K. NG OXFORD SCIENCE PUBLICATIONS Page 2 Page 3 …
for Toeplitz Systems MICHAEL K. NG OXFORD SCIENCE PUBLICATIONS Page 2 Page 3 …