Deep learning using alternating direction method of multipliers

[PDF][PDF] 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs.

F Seide, H Fu, J Droppo, G Li, D Yu - Interspeech, 2014 - isca-archive.org

We show empirically that in SGD training of deep neural networks, one can, at no or nearly
no loss of accuracy, quantize the gradients aggressively—to but one bit per value—if the …

被引用次数：1080 相关文章所有 6 个版本

[PDF] microsoft.com

Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering

K Chen, Q Huo - … conference on acoustics, speech and signal …, 2016 - ieeexplore.ieee.org

We present a new approach to scalable training of deep learning machines by incremental
block training with intra-block parallel optimization to leverage data parallelism and …

被引用次数：164 相关文章所有 4 个版本

Asynchronously training machine learning models across client devices for adaptive intelligence

S Choudhary, SK Mishra, A Garg - US Patent 11,593,634, 2023 - Google Patents

2018-06-19 Assigned to ADOBE SYSTEMS INCORPORATED reassignment ADOBE
SYSTEMS INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT …

被引用次数：37 相关文章所有 4 个版本

[PDF] googleapis.com

Computing system for training neural networks

J Langford, G Li, FTB Seide, J Droppo… - US Patent 11,049,006, 2021 - Google Patents

Techniques and constructs can reduce the time required to determine solutions to
optimization problems such as training of neural networks. Modifications to a computational …

被引用次数：38 相关文章所有 4 个版本

[PDF] googleapis.com

Implementing network security measures in response to a detected cyber attack

MS Musuvathi, TD Mytkowicz, S Maleki… - US Patent …, 2020 - Google Patents

Described herein is a system transmits and combines local models, that individually include
a set of local parameters computed via stochastic gradient descent (SGD), into a global …

被引用次数：27 相关文章所有 4 个版本

[PDF] googleapis.com

Filter specificity as training criterion for neural networks

RB Towal - US Patent 10,515,304, 2019 - Google Patents

US10515304B2 - Filter specificity as training criterion for neural networks - Google Patents
US10515304B2 - Filter specificity as training criterion for neural networks - Google Patents …

被引用次数：26 相关文章所有 4 个版本

[PDF] googleapis.com

Determining a likelihood of a user interaction with a content element

MS Musuvathi, TD Mytkowicz, S Maleki… - US Patent …, 2021 - Google Patents

Described herein is a system that transmits and combines local models, that individually
comprise a set of local parameters computed via stochastic gradient descent (SGD), into a …

被引用次数：19 相关文章所有 4 个版本

[PDF] googleapis.com

Determining a likelihood of a resource experiencing a problem based on telemetry data

MS Musuvathi, TD Mytkowicz, S Maleki… - US Patent …, 2019 - Google Patents

Described herein is a system that transmits and combines local models, that individually
comprise a set of local parameters computed via stochastic gradient descent (SGD), into a …

被引用次数：13 相关文章所有 4 个版本

[PDF] googleapis.com

Training neural networks on partitioned training data

I Sutskever, W Zaremba - US Patent 10,380,482, 2019 - Google Patents

Methods, systems, and apparatus, including computer programs encoded on computer
storage media, for training a neural network. One of the methods includes obtaining …

被引用次数：16 相关文章所有 4 个版本

[PDF] googleapis.com

Method and apparatus for training a learning machine

K Chen, Q Huo - US Patent 11,334,814, 2022 - Google Patents

The disclosure relates to a method and apparatus for training a learning machine, wherein
the apparatus includes: a broadcasting module for broadcasting an initial global model for a …

被引用次数：8 相关文章所有 4 个版本