Whale: Efficient giant model training over heterogeneous {GPUs}
The scaling up of deep neural networks has been demonstrated to be effective in improving
model quality, but also encompasses several training challenges in terms of training …
model quality, but also encompasses several training challenges in terms of training …
Drew: Efficient winograd cnn inference with deep reuse
Deep learning has been used in various domains, including Web services. Convolutional
neural networks (CNNs), which are deep learning representatives, are among the most …
neural networks (CNNs), which are deep learning representatives, are among the most …
Tcb: Accelerating transformer inference services with request concatenation
Transformer has dominated the field of natural language processing because of its strong
capability in learning from sequential input data. In recent years, various computing and …
capability in learning from sequential input data. In recent years, various computing and …
PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences
In emerging DNN serving systems, queries are usually batched to fully leverage hardware
resources, and all the queries in a batch run through the complete model and return at the …
resources, and all the queries in a batch run through the complete model and return at the …
EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers
In recent years, memory-intensive operations are becoming dominant in efficiency of
running novel neural networks. Just-in-time operator fusion on accelerating devices like …
running novel neural networks. Just-in-time operator fusion on accelerating devices like …
ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations
Z Xu, J Xu, H Peng, W Wang, X Wang, H Wan… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep learning models rely on highly optimized tensor libraries for efficient inference on
heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors …
heterogeneous hardware. Current deep compilers typically predetermine layouts of tensors …
RCM: Residue-aware Consolidation for Heterogeneous MLaaS Cluster
K Wu, C Xu, M Zhang, X Hu, Y Jin… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
With the rapid development of Machine Learning (ML), Machine-Learning-as-a-Service
(MLaaS) clusters appear in large numbers to support cloud platforms services, which adopt …
(MLaaS) clusters appear in large numbers to support cloud platforms services, which adopt …