[PDF][PDF] 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs.

F Seide, H Fu, J Droppo, G Li, D Yu - Interspeech, 2014 - isca-archive.org
We show empirically that in SGD training of deep neural networks, one can, at no or nearly
no loss of accuracy, quantize the gradients aggressively—to but one bit per value—if the …

Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering

K Chen, Q Huo - … conference on acoustics, speech and signal …, 2016 - ieeexplore.ieee.org
We present a new approach to scalable training of deep learning machines by incremental
block training with intra-block parallel optimization to leverage data parallelism and …

Asynchronously training machine learning models across client devices for adaptive intelligence

S Choudhary, SK Mishra, A Garg - US Patent 11,593,634, 2023 - Google Patents
2018-06-19 Assigned to ADOBE SYSTEMS INCORPORATED reassignment ADOBE
SYSTEMS INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT …

Computing system for training neural networks

J Langford, G Li, FTB Seide, J Droppo… - US Patent 11,049,006, 2021 - Google Patents
Techniques and constructs can reduce the time required to determine solutions to
optimization problems such as training of neural networks. Modifications to a computational …

Implementing network security measures in response to a detected cyber attack

MS Musuvathi, TD Mytkowicz, S Maleki… - US Patent …, 2020 - Google Patents
Described herein is a system transmits and combines local models, that individually include
a set of local parameters computed via stochastic gradient descent (SGD), into a global …

Filter specificity as training criterion for neural networks

RB Towal - US Patent 10,515,304, 2019 - Google Patents
US10515304B2 - Filter specificity as training criterion for neural networks - Google Patents
US10515304B2 - Filter specificity as training criterion for neural networks - Google Patents …

Determining a likelihood of a user interaction with a content element

MS Musuvathi, TD Mytkowicz, S Maleki… - US Patent …, 2021 - Google Patents
Described herein is a system that transmits and combines local models, that individually
comprise a set of local parameters computed via stochastic gradient descent (SGD), into a …

Determining a likelihood of a resource experiencing a problem based on telemetry data

MS Musuvathi, TD Mytkowicz, S Maleki… - US Patent …, 2019 - Google Patents
Described herein is a system that transmits and combines local models, that individually
comprise a set of local parameters computed via stochastic gradient descent (SGD), into a …

Training neural networks on partitioned training data

I Sutskever, W Zaremba - US Patent 10,380,482, 2019 - Google Patents
Methods, systems, and apparatus, including computer programs encoded on computer
storage media, for training a neural network. One of the methods includes obtaining …

Method and apparatus for training a learning machine

K Chen, Q Huo - US Patent 11,334,814, 2022 - Google Patents
The disclosure relates to a method and apparatus for training a learning machine, wherein
the apparatus includes: a broadcasting module for broadcasting an initial global model for a …