Pipelined Back-Propagation for Context-Dependent Deep Neural Networks.

X Wang, Y Han, VCM Leung, D Niyato… - … Surveys & Tutorials, 2020 - ieeexplore.ieee.org

Ubiquitous sensors and smart devices from factories and communities are generating
massive amounts of data, and ever-increasing computing power is driving the core of …

被引用次数：1415 相关文章所有 4 个版本

[PDF] psu.edu

Recent advances in deep learning for speech research at Microsoft

L Deng, J Li, JT Huang, K Yao, D Yu… - … on acoustics, speech …, 2013 - ieeexplore.ieee.org

Deep learning is becoming a mainstream technology for speech recognition at industrial
scale. In this paper, we provide an overview of the work by Microsoft speech researchers …

被引用次数：1066 相关文章所有 8 个版本

[PDF] nsf.gov

PipeDream: Generalized pipeline parallelism for DNN training

D Narayanan, A Harlap, A Phanishayee… - Proceedings of the 27th …, 2019 - dl.acm.org

DNN training is extremely time-consuming, necessitating efficient multi-accelerator
parallelization. Current approaches to parallelizing training primarily use intra-batch …

被引用次数：919 相关文章所有 17 个版本

[PDF] neurips.cc

Gpipe: Efficient training of giant neural networks using pipeline parallelism

Y Huang, Y Cheng, A Bapna, O Firat… - Advances in neural …, 2019 - proceedings.neurips.cc

Scaling up deep neural network capacity has been known as an effective approach to
improving model quality for several different machine learning tasks. In many cases …

被引用次数：1810 相关文章所有 15 个版本

[PDF] academia.edu

[图书][B] Automatic speech recognition

D Yu, L Deng - 2016 - Springer

Automatic Speech Recognition (ASR), which is aimed to enable natural human–machine
interaction, has been an intensive research area for decades. Many core technologies, such …

被引用次数：1602 相关文章所有 9 个版本

[PDF] isca-archive.org

[PDF][PDF] 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs.

F Seide, H Fu, J Droppo, G Li, D Yu - Interspeech, 2014 - isca-archive.org

We show empirically that in SGD training of deep neural networks, one can, at no or nearly
no loss of accuracy, quantize the gradients aggressively—to but one bit per value—if the …

被引用次数：1150 相关文章所有 6 个版本

[HTML] amazon.science

[HTML][HTML] Scalable distributed DNN training using commodity GPU cloud computing

N Ström - 2015 - amazon.science

We introduce a new method for scaling up distributed Stochastic Gradient Descent (SGD)
training of Deep Neural Networks (DNN). The method solves the well-known communication …

被引用次数：679 相关文章所有 10 个版本

[PDF] nowpublishers.com

Deep learning: methods and applications

L Deng, D Yu - Foundations and trends® in signal processing, 2014 - nowpublishers.com

This monograph provides an overview of general deep learning methodology and its
applications to a variety of signal and information processing tasks. The application areas …

被引用次数：6150 相关文章所有 13 个版本

[PDF] arxiv.org

Pipedream: Fast and efficient pipeline parallel dnn training

A Harlap, D Narayanan, A Phanishayee… - arXiv preprint arXiv …, 2018 - arxiv.org

PipeDream is a Deep Neural Network (DNN) training system for GPUs that parallelizes
computation by pipelining execution across multiple machines. Its pipeline parallel …

被引用次数：286 相关文章所有 10 个版本

[PDF] mlsys.org

Data movement is all you need: A case study on optimizing transformers

A Ivanov, N Dryden, T Ben-Nun, S Li… - … of Machine Learning …, 2021 - proceedings.mlsys.org

Transformers are one of the most important machine learning workloads today. Training one
is a very compute-intensive task, often taking days or weeks, and significant attention has …

被引用次数：153 相关文章所有 20 个版本