[HTML][HTML] Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
In the last few years, the deep learning (DL) computing paradigm has been deemed the
Gold Standard in the machine learning (ML) community. Moreover, it has gradually become …
Gold Standard in the machine learning (ML) community. Moreover, it has gradually become …
Custom scheduling in Kubernetes: A survey on common problems and solution approaches
Z Rejiba, J Chamanara - ACM Computing Surveys, 2022 - dl.acm.org
Since its release in 2014, Kubernetes has become a popular choice for orchestrating
containerized workloads at scale. To determine the most appropriate node to host a given …
containerized workloads at scale. To determine the most appropriate node to host a given …
{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters
With the sustained technological advances in machine learning (ML) and the availability of
massive datasets recently, tech companies are deploying large ML-as-a-Service (MLaaS) …
massive datasets recently, tech companies are deploying large ML-as-a-Service (MLaaS) …
Characterization and prediction of deep learning workloads in large-scale gpu datacenters
Modern GPU datacenters are critical for delivering Deep Learning (DL) models and services
in both the research community and industry. When operating a datacenter, optimization of …
in both the research community and industry. When operating a datacenter, optimization of …
Fluid: Dataset abstraction and elastic acceleration for cloud-native deep learning training jobs
R Gu, K Zhang, Z Xu, Y Che, B Fan… - 2022 IEEE 38th …, 2022 - ieeexplore.ieee.org
Nowdays, it is prevalent to train deep learning (DL) models in cloud-native platforms that
actively leverage containerization and orchestration technologies for high elasticity, low and …
actively leverage containerization and orchestration technologies for high elasticity, low and …
Liquid: Intelligent resource estimation and network-efficient scheduling for deep learning jobs on distributed GPU clusters
Deep learning (DL) is becoming increasingly popular in many domains, including computer
vision, speech recognition, self-driving automobiles, etc. GPU can train DL models efficiently …
vision, speech recognition, self-driving automobiles, etc. GPU can train DL models efficiently …
{CASSINI}:{Network-Aware} Job Scheduling in Machine Learning Clusters
We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters.
CASSINI introduces a novel geometric abstraction to consider the communication pattern of …
CASSINI introduces a novel geometric abstraction to consider the communication pattern of …
Transparent {GPU} sharing in container clouds for deep learning workloads
Containers are widely used for resource management in datacenters. A common practice to
support deep learning (DL) training in container clouds is to statically bind GPUs to …
support deep learning (DL) training in container clouds is to statically bind GPUs to …
Deep learning workload scheduling in gpu datacenters: A survey
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
Beware of Fragmentation: Scheduling {GPU-Sharing} Workloads with Fragmentation Gradient Descent
Large tech companies are piling up a massive number of GPUs in their server fleets to run
diverse machine learning (ML) workloads. However, these expensive devices often suffer …
diverse machine learning (ML) workloads. However, these expensive devices often suffer …