Training Machine Learning Models On A Large-Scale Distributed System Using A Job Server

X Chen, H Zhou, D Wang - US Patent App. 15/497,749, 2018 - Google Patents
A computer system for training machine learning models includes a job server and a
plurality of compute nodes. The job server receives jobs for training machine learning
models and allocates these training jobs to groups of one or more compute nodes. The
allocation is based on the current requirements of the training jobs and the current status of
the compute nodes. The training jobs include updating values for the parameters (eg,
weights and biases) of the machine learning models. Preferably, the compute nodes in the …
以上显示的是最相近的搜索结果。 查看全部搜索结果