BaPipe: Exploration of balanced pipeline parallelism for DNN training

L Zhao, R Xu, T Wang, T Tian, X Wang, W Wu… - arXiv preprint arXiv …, 2020 - arxiv.org
The size of deep neural networks (DNNs) grows rapidly as the complexity of the machine
learning algorithm increases. To satisfy the requirement of computation and memory of DNN …

Scheduling distributed deep learning jobs in heterogeneous cluster with placement awareness

Q Li, J Xu, C Cao - Proceedings of the 12th Asia-Pacific Symposium on …, 2020 - dl.acm.org
Deep Neural Network models are integrated as parts of many real-world software
applications. Due to the huge model size and complex computation, distributed deep …

System and method for dynamic scheduling of distributed deep learning training jobs

T Capes, I Mohomed, V Raheja… - US Patent 11,693,706, 2023 - Google Patents
A scheduling algorithm for scheduling training of deep neural network (DNN) weights on
processing units identifies a next job to provisionally assign a processing unit (PU) based on …