Stable and low-precision training for large-scale vision-language models
M Wortsman, T Dettmers… - Advances in …, 2023 - proceedings.neurips.cc
We introduce new methods for 1) accelerating and 2) stabilizing training for large language-
vision models. 1) For acceleration, we introduce SwitchBack, a linear layer for int8 quantized …
vision models. 1) For acceleration, we introduce SwitchBack, a linear layer for int8 quantized …
Virchow2: Scaling self-supervised mixed magnification models in pathology
Foundation models are rapidly being developed for computational pathology applications.
However, it remains an open question which factors are most important for downstream …
However, it remains an open question which factors are most important for downstream …
TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process
Recently diffusion models have emerged as a new powerful generative method for 3D point
cloud generation tasks. However few works study the effect of the architecture of the …
cloud generation tasks. However few works study the effect of the architecture of the …
Adaptive computation with elastic input sequence
Humans have the ability to adapt the type of information they use, the procedure they
employ, and the amount of time they spend when solving problems. However, most standard …
employ, and the amount of time they spend when solving problems. However, most standard …
Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach
We present a deep-learning approach for the task of Concurrent Speaker Detection (CSD)
using a modified transformer model. Our model is designed to handle multi-microphone data …
using a modified transformer model. Our model is designed to handle multi-microphone data …
A Fast Target Detection Model for Remote Sensing Images Leveraging Roofline Analysis on Edge Computing Devices
B Zhao, Z Qin, Y Wu, Y Song, H Yu… - IEEE Journal of Selected …, 2024 - ieeexplore.ieee.org
Deploying image target detection algorithms on embedded devices is critical. Previous
studies assumed that fewer model parameters and computations improved the inference …
studies assumed that fewer model parameters and computations improved the inference …
Learning from Offline Foundation Features with Tensor Augmentations
E Konuk, C Matsoukas, M Sorkhei… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce Learning from Offline Foundation Features with Tensor Augmentations (LOFF-
TA), an efficient training scheme designed to harness the capabilities of foundation models …
TA), an efficient training scheme designed to harness the capabilities of foundation models …
Customize Your Own Paired Data via Few-shot Way
Existing solutions to image editing tasks suffer from several issues. Though achieving
remarkably satisfying generated results, some supervised methods require huge amounts of …
remarkably satisfying generated results, some supervised methods require huge amounts of …
TrAct: Making First-layer Pre-Activations Trainable
We consider the training of the first layer of vision models and notice the clear relationship
between pixel values and gradient update magnitudes: the gradients arriving at the weights …
between pixel values and gradient update magnitudes: the gradients arriving at the weights …
Correct Placement of Normalization Layers in Click-Through Rate Prediction Models
Click-Through Rate (CTR) prediction is an important application in online advertising, and
deep learning-based models are developed to maximize CTR prediction. In this study, the …
deep learning-based models are developed to maximize CTR prediction. In this study, the …