An algorithm–hardware co-optimized framework for accelerating n: M sparse transformers

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier

Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

被引用次数：22 相关文章所有 6 个版本

[PDF] arxiv.org

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

被引用次数：47 相关文章所有 4 个版本

[PDF] acm.org

Fact: Ffn-attention co-optimized transformer architecture with eager correlation prediction

Y Qin, Y Wang, D Deng, Z Zhao, X Yang, L Liu… - Proceedings of the 50th …, 2023 - dl.acm.org

Transformer model is becoming prevalent in various AI applications with its outstanding
performance. However, the high cost of computation and memory footprint make its …

被引用次数：16 相关文章

[PDF] mlr.press

Bi-directional masks for efficient n: M sparse training

Y Zhang, Y Luo, M Lin, Y Zhong, J Xie… - … on machine learning, 2023 - proceedings.mlr.press

We focus on addressing the dense backward propagation issue for training efficiency of N:
M fine-grained sparsity that preserves at most N out of M consecutive weights and achieves …

被引用次数：11 相关文章所有 6 个版本

Sparsity in transformers: A systematic literature review

M Farina, U Ahmad, A Taha, H Younes, Y Mesbah… - Neurocomputing, 2024 - Elsevier

Transformers have become the state-of-the-art architectures for various tasks in Natural
Language Processing (NLP) and Computer Vision (CV); however, their space and …

[PDF] thecvf.com

Sparsemae: Sparse training meets masked autoencoders

A Zhou, Y Li, Z Qin, J Liu, J Pan… - Proceedings of the …, 2023 - openaccess.thecvf.com

Masked Autoencoders (MAE) and its variants have proven to be effective for pretraining
large-scale Vision Transformers (ViTs). However, small-scale models do not benefit from the …

被引用次数：2 相关文章所有 4 个版本

[PDF] mlr.press

STEP: learning N: M structured sparsity masks from scratch with precondition

Y Lu, S Agrawal, S Subramanian… - International …, 2023 - proceedings.mlr.press

Recent innovations on hardware (eg Nvidia A100) have motivated learning N: M structured
sparsity masks from scratch for fast model inference. However, state-of-the-art learning …

被引用次数：7 相关文章所有 8 个版本

[PDF] arxiv.org

Qserve: W4a8kv4 quantization and system co-design for efficient llm serving

Y Lin, H Tang, S Yang, Z Zhang, G Xiao, C Gan… - arXiv preprint arXiv …, 2024 - arxiv.org

Quantization can accelerate large language model (LLM) inference. Going beyond INT8
quantization, the research community is actively exploring even lower precision, such as …

被引用次数：4 相关文章所有 2 个版本

[PDF] google.com

Deep Learning in Environmental Toxicology: Current Progress and Open Challenges

H Tan, J Jin, C Fang, Y Zhang, B Chang… - ACS ES&T …, 2023 - ACS Publications

Ubiquitous chemicals in the environment may pose a threat to human health and the
ecosystem, so comprehensive toxicity information must be obtained. Due to the inability of …

被引用次数：4 相关文章

[PDF] arxiv.org

BEBERT: Efficient and robust binary ensemble BERT

J Tian, C Fang, H Wang, Z Wang - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Pre-trained BERT models have achieved impressive accuracy on natural language
processing (NLP) tasks. However, their excessive amount of parameters hinders them from …

被引用次数：8 相关文章所有 4 个版本