DeepVM: Integrating Spot and On-Demand VMs for Cost-Efficient Deep Learning Clusters in the Cloud

Y Kim, K Kim, Y Cho, J Kim, A Khan, KD Kang… - arXiv preprint arXiv …, 2024 - arxiv.org
Distributed Deep Learning (DDL), as a paradigm, dictates the use of GPU-based clusters as
the optimal infrastructure for training large-scale Deep Neural Networks (DNNs). However …