AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training...

X Jia, L Jiang, A Wang, W Xiao, Z Shi, J Zhang… - 2022 USENIX Annual …, 2022 - usenix.org

The scaling up of deep neural networks has been demonstrated to be effective in improving
model quality, but also encompasses several training challenges in terms of training …

被引用次数：33 相关文章所有 6 个版本

[PDF] ncsu.edu

Drew: Efficient winograd cnn inference with deep reuse

R Wu, F Zhang, J Guan, Z Zheng, X Du… - Proceedings of the ACM …, 2022 - dl.acm.org

Deep learning has been used in various domains, including Web services. Convolutional
neural networks (CNNs), which are deep learning representatives, are among the most …

被引用次数：15 相关文章所有 4 个版本

[PDF] u-aizu.ac.jp

Tcb: Accelerating transformer inference services with request concatenation

B Fu, F Chen, P Li, D Zeng - … of the 51st International Conference on …, 2022 - dl.acm.org

Transformer has dominated the field of natural language processing because of its strong
capability in learning from sequential input data. In recent years, various computing and …

被引用次数：11 相关文章所有 3 个版本

[PDF] google.com

PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences

S Zhang, W Cui, Q Chen, Z Zhang, Y Guan… - Proceedings of the 36th …, 2022 - dl.acm.org

In emerging DNN serving systems, queries are usually batched to fully leverage hardware
resources, and all the queries in a batch run through the complete model and return at the …

被引用次数：6 相关文章所有 2 个版本

EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers

L Jiang, P Xu, Q Zhu, X Li, S Yan, X Zhang… - Proceedings of the 51st …, 2022 - dl.acm.org

In recent years, memory-intensive operations are becoming dominant in efficiency of
running novel neural networks. Just-in-time operator fusion on accelerating devices like …

被引用次数：1 相关文章

[PDF] arxiv.org

ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations

Z Xu, J Xu, H Peng, W Wang, X Wang, H Wan… - arXiv preprint arXiv …, 2022 - arxiv.org

Deep learning models rely on highly optimized tensor libraries for efficient inference on
heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors …

RCM: Residue-aware Consolidation for Heterogeneous MLaaS Cluster

K Wu, C Xu, M Zhang, X Hu, Y Jin… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

With the rapid development of Machine Learning (ML), Machine-Learning-as-a-Service
(MLaaS) clusters appear in large numbers to support cloud platforms services, which adopt …