关注
Dehao Chen
Dehao Chen
在 google.com 的电子邮件经过验证
标题
引用次数
引用次数
年份
Gpipe: Efficient training of giant neural networks using pipeline parallelism
Y Huang, Y Cheng, A Bapna, O Firat, D Chen, M Chen, HJ Lee, J Ngiam, ...
Advances in neural information processing systems 32, 2019
15402019
Lamda: Language models for dialog applications
R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ...
arXiv preprint arXiv:2201.08239, 2022
12452022
Gshard: Scaling giant models with conditional computation and automatic sharding
D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ...
arXiv preprint arXiv:2006.16668, 2020
7802020
Mlperf training benchmark
P Mattson, C Cheng, G Diamos, C Coleman, P Micikevicius, D Patterson, ...
Proceedings of Machine Learning and Systems 2, 336-349, 2020
3122020
MapCG: Writing parallel program portable between CPU and GPU
C Hong, D Chen, W Chen, W Zheng, H Lin
Proceedings of the 19th international conference on Parallel architectures …, 2010
2242010
Lingvo: a modular and scalable framework for sequence-to-sequence modeling
J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ...
arXiv preprint arXiv:1902.08295, 2019
2022019
Image classification at supercomputer scale
C Ying, S Kumar, D Chen, T Wang, Y Cheng
arXiv preprint arXiv:1811.06992, 2018
1492018
AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications
D Chen, DX Li, T Moseley
Proceedings of the 2016 International Symposium on Code Generation and …, 2016
1162016
Gspmd: general and scalable parallelization for ml computation graphs
Y Xu, HJ Lee, D Chen, B Hechtman, Y Huang, R Joshi, M Krikun, ...
arXiv preprint arXiv:2105.04663, 2021
982021
Renelito Delos Santos
R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ...
942022
Taming hardware event samples for FDO compilation
D Chen, N Vachharajani, R Hundt, S Liao, V Ramasamy, P Yuan, W Chen, ...
Proceedings of the 8th annual IEEE/ACM international symposium on Code …, 2010
862010
Tree partition based parallel frequent pattern mining on shared memory systems
D Chen, C Lai, W Hu, WG Chen, Y Zhang, W Zheng
Proceedings 20th IEEE International Parallel & Distributed Processing …, 2006
532006
Taming hardware event samples for precise and versatile feedback directed optimizations
D Chen, N Vachharajani, R Hundt, X Li, S Eranian, W Chen, W Zheng
IEEE Transactions on Computers 62 (2), 376-389, 2011
492011
Scale mlperf-0.6 models on google tpu-v3 pods
S Kumar, V Bitorff, D Chen, C Chou, B Hechtman, HJ Lee, N Kumar, ...
arXiv preprint arXiv:1909.09756, 2019
382019
Overlap communication with dependent computation via decomposition in large deep learning models
S Wang, J Wei, A Sabne, A Davis, B Ilbeyi, B Hechtman, D Chen, ...
Proceedings of the 28th ACM International Conference on Architectural …, 2022
322022
Automatic cross-replica sharding of weight update in data-parallel training
Y Xu, HJ Lee, D Chen, H Choi, B Hechtman, S Wang
arXiv preprint arXiv:2004.13336, 2020
272020
Feedback-directed optimizations in gcc with estimated edge profiles from hardware event sampling
V Ramasamy, P Yuan, D Chen, R Hundt
Proceedings of GCC Summit, 87-102, 2008
222008
Providing source code level portability between CPU and GPU with MapCG
CT Hong, DH Chen, YB Chen, WG Chen, WM Zheng, HB Lin
Journal of Computer Science and Technology 27 (1), 42-56, 2012
212012
Compile-time feedback-directed optimizations using estimated edge profiles from hardware-event sampling
R Hundt, V Ramasamy, D Chen
US Patent 8,387,026, 2013
202013
Exploring the limits of Concurrency in ML Training on Google TPUs
S Kumar, Y Wang, C Young, J Bradbury, N Kumar, D Chen, A Swing
Proceedings of Machine Learning and Systems 3, 81-92, 2021
182021
系统目前无法执行此操作,请稍后再试。
文章 1–20