Fantastic weights and how to find them: Where to prune in dynamic sparse training

T Peng, G Gao, H Sun, F Zhang… - 2024 Picture Coding …, 2024 - ieeexplore.ieee.org

In recent years, end-to-end learnt video codecs have demonstrated their potential to
compete with conventional coding algorithms in term of compression efficiency. However …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency

Y Guo, M Kankanhalli - arXiv preprint arXiv:2411.09126, 2024 - arxiv.org

While contrastive pre-training is widely employed, its data efficiency problem has remained
relatively under-explored thus far. Existing methods often rely on static coreset selection …

Navigating Extremes: Dynamic Sparsity in Large Output Space

N Ullah, E Schultheis, M Lasby, Y Ioannou… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-
training pruning for generating efficient models. In principle, DST allows for a more memory …

Mixed Sparsity Training: Achieving 4 FLOP Reduction for Transformer Pretraining

P Hu, S Li, L Huang - arXiv preprint arXiv:2408.11746, 2024 - arxiv.org

Large language models (LLMs) have made significant strides in complex tasks, yet their
widespread adoption is impeded by substantial computational demands. With hundreds of …

A Comparative Study of Pruning Methods in Transformer-based Time Series Forecasting

N Kiefer, A Weyrauch, M Öz, A Streit, M Götz… - arXiv preprint arXiv …, 2024 - arxiv.org

The current landscape in time-series forecasting is dominated by Transformer-based
models. Their high parameter count and corresponding demand in computational resources …

[PDF][PDF] Large Learning Agents: Towards Continually Aligned Robots with Scale in RL.

B Grooten - AAMAS, 2024 - ifmas.csc.liv.ac.uk

In the field of deep reinforcement learning significant progress has been made, but it seems
we are missing the power of the scaling laws evident in large language models. This …

BiDST: Dynamic Sparse Training is a Bi-Level Optimization Problem

J Ji, G Li, L Yin, M Qin, G Yuan, L Guo, S Liu, X Ma - openreview.net

Dynamic Sparse Training (DST) is an effective approach for addressing the substantial
training resource requirements posed by the ever-increasing size of the Deep Neural …