Long-short transformer: Efficient transformers for language and vision
Transformers have achieved success in both language and vision domains. However, it is
prohibitively expensive to scale them to long sequences such as long documents or high …
prohibitively expensive to scale them to long sequences such as long documents or high …
Exploring the limits of large scale pre-training
Recent developments in large-scale machine learning suggest that by scaling up data,
model size and training time properly, one might observe that improvements in pre-training …
model size and training time properly, one might observe that improvements in pre-training …
The efficiency misnomer
Model efficiency is a critical aspect of developing and deploying machine learning models.
Inference time and latency directly affect the user experience, and some applications have …
Inference time and latency directly affect the user experience, and some applications have …
KVT: k-NN Attention for Boosting Vision Transformers
Abstract Convolutional Neural Networks (CNNs) have dominated computer vision for years,
due to its ability in capturing locality and translation invariance. Recently, many vision …
due to its ability in capturing locality and translation invariance. Recently, many vision …
Scenic: A jax library for computer vision research and beyond
Scenic is an open-source (https://github. com/google-research/scenic) JAX library with a
focus on transformer-based models for computer vision research and beyond. The goal of …
focus on transformer-based models for computer vision research and beyond. The goal of …
[HTML][HTML] Diagnosis of schizophrenia based on the data of various modalities: biomarkers and machine learning techniques
MG Sharaev, IK Malashenkova… - Современные …, 2022 - cyberleninka.ru
Schizophrenia is a socially significant mental disorder resulting frequently in severe forms of
disability. Diagnosis, choice of treatment tactics, and rehabilitation in clinical psychiatry are …
disability. Diagnosis, choice of treatment tactics, and rehabilitation in clinical psychiatry are …
FLORA: Fine-grained Low-Rank Architecture Search for Vision Transformer
Abstract Vision Transformers (ViT) have recently demonstrated success across a myriad of
computer vision tasks. However, their elevated computational demands pose significant …
computer vision tasks. However, their elevated computational demands pose significant …
Learning a fourier transform for linear relative positional encodings in transformers
We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs),
which incorporate a wide range of relative positional encoding mechanisms (RPEs). These …
which incorporate a wide range of relative positional encoding mechanisms (RPEs). These …
Softmax bottleneck makes language models unable to represent multi-mode word distributions
HS Chang, A McCallum - Proceedings of the 60th Annual Meeting of the …, 2022 - par.nsf.gov
Neural language models (LMs) such as GPT-2 estimate the probability distribution over the
next word by a softmax over the vocabulary. The softmax layer produces the distribution …
next word by a softmax over the vocabulary. The softmax layer produces the distribution …
Quickskill: Novice skill estimation in online multiplayer games
Matchmaking systems are vital for creating fair matches in online multiplayer games, which
directly affects players' satisfactions and game experience. Most of the matchmaking …
directly affects players' satisfactions and game experience. Most of the matchmaking …