Convergence of edge computing and deep learning: A comprehensive survey

X Wang, Y Han, VCM Leung, D Niyato… - … Surveys & Tutorials, 2020 - ieeexplore.ieee.org
Ubiquitous sensors and smart devices from factories and communities are generating
massive amounts of data, and ever-increasing computing power is driving the core of …

Recent advances in deep learning for speech research at Microsoft

L Deng, J Li, JT Huang, K Yao, D Yu… - … on acoustics, speech …, 2013 - ieeexplore.ieee.org
Deep learning is becoming a mainstream technology for speech recognition at industrial
scale. In this paper, we provide an overview of the work by Microsoft speech researchers …

PipeDream: Generalized pipeline parallelism for DNN training

D Narayanan, A Harlap, A Phanishayee… - Proceedings of the 27th …, 2019 - dl.acm.org
DNN training is extremely time-consuming, necessitating efficient multi-accelerator
parallelization. Current approaches to parallelizing training primarily use intra-batch …

Gpipe: Efficient training of giant neural networks using pipeline parallelism

Y Huang, Y Cheng, A Bapna, O Firat… - Advances in neural …, 2019 - proceedings.neurips.cc
Scaling up deep neural network capacity has been known as an effective approach to
improving model quality for several different machine learning tasks. In many cases …

[图书][B] Automatic speech recognition

D Yu, L Deng - 2016 - Springer
Automatic Speech Recognition (ASR), which is aimed to enable natural human–machine
interaction, has been an intensive research area for decades. Many core technologies, such …

[PDF][PDF] 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs.

F Seide, H Fu, J Droppo, G Li, D Yu - Interspeech, 2014 - isca-archive.org
We show empirically that in SGD training of deep neural networks, one can, at no or nearly
no loss of accuracy, quantize the gradients aggressively—to but one bit per value—if the …

[HTML][HTML] Scalable distributed DNN training using commodity GPU cloud computing

N Ström - 2015 - amazon.science
We introduce a new method for scaling up distributed Stochastic Gradient Descent (SGD)
training of Deep Neural Networks (DNN). The method solves the well-known communication …

Deep learning: methods and applications

L Deng, D Yu - Foundations and trends® in signal processing, 2014 - nowpublishers.com
This monograph provides an overview of general deep learning methodology and its
applications to a variety of signal and information processing tasks. The application areas …

Pipedream: Fast and efficient pipeline parallel dnn training

A Harlap, D Narayanan, A Phanishayee… - arXiv preprint arXiv …, 2018 - arxiv.org
PipeDream is a Deep Neural Network (DNN) training system for GPUs that parallelizes
computation by pipelining execution across multiple machines. Its pipeline parallel …

Data movement is all you need: A case study on optimizing transformers

A Ivanov, N Dryden, T Ben-Nun, S Li… - … of Machine Learning …, 2021 - proceedings.mlsys.org
Transformers are one of the most important machine learning workloads today. Training one
is a very compute-intensive task, often taking days or weeks, and significant attention has …