Long-short transformer: Efficient transformers for language and vision

C Zhu, W Ping, C Xiao, M Shoeybi… - Advances in neural …, 2021 - proceedings.neurips.cc
Transformers have achieved success in both language and vision domains. However, it is
prohibitively expensive to scale them to long sequences such as long documents or high …

Exploring the limits of large scale pre-training

S Abnar, M Dehghani, B Neyshabur… - arXiv preprint arXiv …, 2021 - arxiv.org
Recent developments in large-scale machine learning suggest that by scaling up data,
model size and training time properly, one might observe that improvements in pre-training …

The efficiency misnomer

M Dehghani, A Arnab, L Beyer, A Vaswani… - arXiv preprint arXiv …, 2021 - arxiv.org
Model efficiency is a critical aspect of developing and deploying machine learning models.
Inference time and latency directly affect the user experience, and some applications have …

KVT: k-NN Attention for Boosting Vision Transformers

P Wang, X Wang, F Wang, M Lin, S Chang, H Li… - European conference on …, 2022 - Springer
Abstract Convolutional Neural Networks (CNNs) have dominated computer vision for years,
due to its ability in capturing locality and translation invariance. Recently, many vision …

Scenic: A jax library for computer vision research and beyond

M Dehghani, A Gritsenko, A Arnab… - Proceedings of the …, 2022 - openaccess.thecvf.com
Scenic is an open-source (https://github. com/google-research/scenic) JAX library with a
focus on transformer-based models for computer vision research and beyond. The goal of …

[HTML][HTML] Diagnosis of schizophrenia based on the data of various modalities: biomarkers and machine learning techniques

MG Sharaev, IK Malashenkova… - Современные …, 2022 - cyberleninka.ru
Schizophrenia is a socially significant mental disorder resulting frequently in severe forms of
disability. Diagnosis, choice of treatment tactics, and rehabilitation in clinical psychiatry are …

FLORA: Fine-grained Low-Rank Architecture Search for Vision Transformer

CC Chang, YY Sung, S Yu, NC Huang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Vision Transformers (ViT) have recently demonstrated success across a myriad of
computer vision tasks. However, their elevated computational demands pose significant …

Learning a fourier transform for linear relative positional encodings in transformers

K Choromanski, S Li, V Likhosherstov… - International …, 2024 - proceedings.mlr.press
We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs),
which incorporate a wide range of relative positional encoding mechanisms (RPEs). These …

Softmax bottleneck makes language models unable to represent multi-mode word distributions

HS Chang, A McCallum - Proceedings of the 60th Annual Meeting of the …, 2022 - par.nsf.gov
Neural language models (LMs) such as GPT-2 estimate the probability distribution over the
next word by a softmax over the vocabulary. The softmax layer produces the distribution …

Quickskill: Novice skill estimation in online multiplayer games

C Zhang, K Wang, H Chen, G Fan, Y Li, L Wu… - Proceedings of the 31st …, 2022 - dl.acm.org
Matchmaking systems are vital for creating fair matches in online multiplayer games, which
directly affects players' satisfactions and game experience. Most of the matchmaking …