A survey of techniques for optimizing transformer inference
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …
transformer neural networks. The family of transformer networks, including Bidirectional …
Full stack optimization of transformer inference: a survey
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …
Transformer models. These models achieve superior accuracy across a wide range of …
Fact: Ffn-attention co-optimized transformer architecture with eager correlation prediction
Transformer model is becoming prevalent in various AI applications with its outstanding
performance. However, the high cost of computation and memory footprint make its …
performance. However, the high cost of computation and memory footprint make its …
Bi-directional masks for efficient n: M sparse training
We focus on addressing the dense backward propagation issue for training efficiency of N:
M fine-grained sparsity that preserves at most N out of M consecutive weights and achieves …
M fine-grained sparsity that preserves at most N out of M consecutive weights and achieves …
Sparsity in transformers: A systematic literature review
Transformers have become the state-of-the-art architectures for various tasks in Natural
Language Processing (NLP) and Computer Vision (CV); however, their space and …
Language Processing (NLP) and Computer Vision (CV); however, their space and …
Sparsemae: Sparse training meets masked autoencoders
Masked Autoencoders (MAE) and its variants have proven to be effective for pretraining
large-scale Vision Transformers (ViTs). However, small-scale models do not benefit from the …
large-scale Vision Transformers (ViTs). However, small-scale models do not benefit from the …
STEP: learning N: M structured sparsity masks from scratch with precondition
Recent innovations on hardware (eg Nvidia A100) have motivated learning N: M structured
sparsity masks from scratch for fast model inference. However, state-of-the-art learning …
sparsity masks from scratch for fast model inference. However, state-of-the-art learning …
Qserve: W4a8kv4 quantization and system co-design for efficient llm serving
Quantization can accelerate large language model (LLM) inference. Going beyond INT8
quantization, the research community is actively exploring even lower precision, such as …
quantization, the research community is actively exploring even lower precision, such as …
Deep Learning in Environmental Toxicology: Current Progress and Open Challenges
H Tan, J Jin, C Fang, Y Zhang, B Chang… - ACS ES&T …, 2023 - ACS Publications
Ubiquitous chemicals in the environment may pose a threat to human health and the
ecosystem, so comprehensive toxicity information must be obtained. Due to the inability of …
ecosystem, so comprehensive toxicity information must be obtained. Due to the inability of …
BEBERT: Efficient and robust binary ensemble BERT
Pre-trained BERT models have achieved impressive accuracy on natural language
processing (NLP) tasks. However, their excessive amount of parameters hinders them from …
processing (NLP) tasks. However, their excessive amount of parameters hinders them from …