Accelerating learnt video codecs with gradient decay and layer-wise distillation
In recent years, end-to-end learnt video codecs have demonstrated their potential to
compete with conventional coding algorithms in term of compression efficiency. However …
compete with conventional coding algorithms in term of compression efficiency. However …
SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency
Y Guo, M Kankanhalli - arXiv preprint arXiv:2411.09126, 2024 - arxiv.org
While contrastive pre-training is widely employed, its data efficiency problem has remained
relatively under-explored thus far. Existing methods often rely on static coreset selection …
relatively under-explored thus far. Existing methods often rely on static coreset selection …
Navigating Extremes: Dynamic Sparsity in Large Output Space
In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-
training pruning for generating efficient models. In principle, DST allows for a more memory …
training pruning for generating efficient models. In principle, DST allows for a more memory …
Mixed Sparsity Training: Achieving 4 FLOP Reduction for Transformer Pretraining
Large language models (LLMs) have made significant strides in complex tasks, yet their
widespread adoption is impeded by substantial computational demands. With hundreds of …
widespread adoption is impeded by substantial computational demands. With hundreds of …
A Comparative Study of Pruning Methods in Transformer-based Time Series Forecasting
The current landscape in time-series forecasting is dominated by Transformer-based
models. Their high parameter count and corresponding demand in computational resources …
models. Their high parameter count and corresponding demand in computational resources …
[PDF][PDF] Large Learning Agents: Towards Continually Aligned Robots with Scale in RL.
B Grooten - AAMAS, 2024 - ifmas.csc.liv.ac.uk
In the field of deep reinforcement learning significant progress has been made, but it seems
we are missing the power of the scaling laws evident in large language models. This …
we are missing the power of the scaling laws evident in large language models. This …
BiDST: Dynamic Sparse Training is a Bi-Level Optimization Problem
Dynamic Sparse Training (DST) is an effective approach for addressing the substantial
training resource requirements posed by the ever-increasing size of the Deep Neural …
training resource requirements posed by the ever-increasing size of the Deep Neural …