Stable and low-precision training for large-scale vision-language models

M Wortsman, T Dettmers… - Advances in …, 2023 - proceedings.neurips.cc
We introduce new methods for 1) accelerating and 2) stabilizing training for large language-
vision models. 1) For acceleration, we introduce SwitchBack, a linear layer for int8 quantized …

Virchow2: Scaling self-supervised mixed magnification models in pathology

E Zimmermann, E Vorontsov, J Viret, A Casson… - arXiv preprint arXiv …, 2024 - arxiv.org
Foundation models are rapidly being developed for computational pathology applications.
However, it remains an open question which factors are most important for downstream …

TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process

Z Ren, M Kim, F Liu, X Liu - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Recently diffusion models have emerged as a new powerful generative method for 3D point
cloud generation tasks. However few works study the effect of the architecture of the …

Adaptive computation with elastic input sequence

F Xue, V Likhosherstov, A Arnab… - International …, 2023 - proceedings.mlr.press
Humans have the ability to adapt the type of information they use, the procedure they
employ, and the amount of time they spend when solving problems. However, most standard …

Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach

A Eliav, S Gannot - arXiv preprint arXiv:2403.06856, 2024 - arxiv.org
We present a deep-learning approach for the task of Concurrent Speaker Detection (CSD)
using a modified transformer model. Our model is designed to handle multi-microphone data …

A Fast Target Detection Model for Remote Sensing Images Leveraging Roofline Analysis on Edge Computing Devices

B Zhao, Z Qin, Y Wu, Y Song, H Yu… - IEEE Journal of Selected …, 2024 - ieeexplore.ieee.org
Deploying image target detection algorithms on embedded devices is critical. Previous
studies assumed that fewer model parameters and computations improved the inference …

Learning from Offline Foundation Features with Tensor Augmentations

E Konuk, C Matsoukas, M Sorkhei… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce Learning from Offline Foundation Features with Tensor Augmentations (LOFF-
TA), an efficient training scheme designed to harness the capabilities of foundation models …

Customize Your Own Paired Data via Few-shot Way

J Chen, B Li, M Hua, P Xu, Q He - arXiv preprint arXiv:2405.12490, 2024 - arxiv.org
Existing solutions to image editing tasks suffer from several issues. Though achieving
remarkably satisfying generated results, some supervised methods require huge amounts of …

TrAct: Making First-layer Pre-Activations Trainable

F Petersen, C Borgelt, S Ermon - arXiv preprint arXiv:2410.23970, 2024 - arxiv.org
We consider the training of the first layer of vision models and notice the clear relationship
between pixel values and gradient update magnitudes: the gradients arriving at the weights …

Correct Placement of Normalization Layers in Click-Through Rate Prediction Models

İC Yılmaz, H Saribaş, AB Kanburoğlu… - 2023 8th International …, 2023 - ieeexplore.ieee.org
Click-Through Rate (CTR) prediction is an important application in online advertising, and
deep learning-based models are developed to maximize CTR prediction. In this study, the …