Whale: Efficient giant model training over heterogeneous {GPUs}

X Jia, L Jiang, A Wang, W Xiao, Z Shi, J Zhang… - 2022 USENIX Annual …, 2022 - usenix.org
The scaling up of deep neural networks has been demonstrated to be effective in improving
model quality, but also encompasses several training challenges in terms of training …

Drew: Efficient winograd cnn inference with deep reuse

R Wu, F Zhang, J Guan, Z Zheng, X Du… - Proceedings of the ACM …, 2022 - dl.acm.org
Deep learning has been used in various domains, including Web services. Convolutional
neural networks (CNNs), which are deep learning representatives, are among the most …

Tcb: Accelerating transformer inference services with request concatenation

B Fu, F Chen, P Li, D Zeng - … of the 51st International Conference on …, 2022 - dl.acm.org
Transformer has dominated the field of natural language processing because of its strong
capability in learning from sequential input data. In recent years, various computing and …

PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences

S Zhang, W Cui, Q Chen, Z Zhang, Y Guan… - Proceedings of the 36th …, 2022 - dl.acm.org
In emerging DNN serving systems, queries are usually batched to fully leverage hardware
resources, and all the queries in a batch run through the complete model and return at the …

EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers

L Jiang, P Xu, Q Zhu, X Li, S Yan, X Zhang… - Proceedings of the 51st …, 2022 - dl.acm.org
In recent years, memory-intensive operations are becoming dominant in efficiency of
running novel neural networks. Just-in-time operator fusion on accelerating devices like …

ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations

Z Xu, J Xu, H Peng, W Wang, X Wang, H Wan… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep learning models rely on highly optimized tensor libraries for efficient inference on
heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors …

RCM: Residue-aware Consolidation for Heterogeneous MLaaS Cluster

K Wu, C Xu, M Zhang, X Hu, Y Jin… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
With the rapid development of Machine Learning (ML), Machine-Learning-as-a-Service
(MLaaS) clusters appear in large numbers to support cloud platforms services, which adopt …