Rammer: Enabling holistic deep learning compiler optimizations with {rTasks}

Hybridized artificial intelligence models with nature-inspired algorithms for river flow modeling: A comprehensive review, assessment, and possible future research …

H Tao, SI Abba, AM Al-Areeq, F Tangang… - … Applications of Artificial …, 2024 - Elsevier

River flow (Q flow) is a hydrological process that considerably impacts the management and
sustainability of water resources. The literature has shown great potential for nature-inspired …

被引用次数：39 相关文章所有 4 个版本

[PDF] acm.org

Efficient memory management for large language model serving with pagedattention

W Kwon, Z Li, S Zhuang, Y Sheng, L Zheng… - Proceedings of the 29th …, 2023 - dl.acm.org

High throughput serving of large language models (LLMs) requires batching sufficiently
many requests at a time. However, existing systems struggle because the key-value cache …

被引用次数：857 相关文章所有 4 个版本

[PDF] usenix.org

Orca: A distributed serving system for {Transformer-Based} generative models

GI Yu, JS Jeong, GW Kim, S Kim, BG Chun - 16th USENIX Symposium …, 2022 - usenix.org

Large-scale Transformer-based models trained for generation tasks (eg, GPT-3) have
recently attracted huge interest, emphasizing the need for system support for serving models …

被引用次数：274 相关文章所有 6 个版本

[PDF] neurips.cc

A fast post-training pruning framework for transformers

W Kwon, S Kim, MW Mahoney… - Advances in …, 2022 - proceedings.neurips.cc

Pruning is an effective way to reduce the huge inference cost of Transformer models.
However, prior work on pruning Transformers requires retraining the models. This can add …

被引用次数：111 相关文章所有 9 个版本

[PDF] acm.org

Dnnfusion: accelerating deep neural networks execution with advanced operator fusion

W Niu, J Guan, Y Wang, G Agrawal, B Ren - Proceedings of the 42nd …, 2021 - dl.acm.org

Deep Neural Networks (DNNs) have emerged as the core enabler of many major
applications on mobile devices. To achieve high accuracy, DNN models have become …

被引用次数：137 相关文章所有 7 个版本

[PDF] usenix.org

{ROLLER}: Fast and efficient tensor compilation for deep learning

H Zhu, R Wu, Y Diao, S Ke, H Li, C Zhang… - … USENIX Symposium on …, 2022 - usenix.org

Despite recent advances in tensor compilers, it often costs hours to generate an efficient
kernel for an operator, a compute-intensive sub-task in a deep neural network (DNN), on …

被引用次数：67 相关文章所有 7 个版本

[PDF] usenix.org

{nnScaler}:{Constraint-Guided} Parallelization Plan Generation for Deep Learning Training

Z Lin, Y Miao, Q Zhang, F Yang, Y Zhu, C Li… - … USENIX Symposium on …, 2024 - usenix.org

With the growing model size of deep neural networks (DNN), deep learning training is
increasingly relying on handcrafted search spaces to find efficient parallelization execution …

被引用次数：4 相关文章所有 2 个版本

[PDF] acm.org

AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction

S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu… - Proceedings of the 49th …, 2022 - dl.acm.org

Hardware specialization is a promising trend to sustain performance growth. Spatial
hardware accelerators that employ specialized and hierarchical computation and memory …

被引用次数：53 相关文章所有 3 个版本

[PDF] mlsys.org

Apollo: Automatic partition-based operator fusion through layer by layer optimization

J Zhao, X Gao, R Xia, Z Zhang… - Proceedings of …, 2022 - proceedings.mlsys.org

We study fusion for deep neural networks (DNNs) in a just-in-time (JIT) compilation
framework Apollo. It considers both memory-and compute-bound tensor operators for fusion …

被引用次数：42 相关文章所有 4 个版本

[PDF] github.io

AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures

Z Zheng, X Yang, P Zhao, G Long, K Zhu… - Proceedings of the 27th …, 2022 - dl.acm.org

This work reveals that memory-intensive computation is a rising performance-critical factor in
recent machine learning models. Due to a unique set of new challenges, existing ML …

被引用次数：57 相关文章所有 2 个版本