Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training E Qin, A Samajdar, H Kwon, V Nadella, S Srinivasan, D Das, B Kaul, ... 2020 IEEE International Symposium on High Performance Computer Architecture …, 2020 | 415 | 2020 |
Graphmat: High performance graph analytics made productive N Sundaram, NR Satish, MMA Patwary, SR Dulloor, SG Vadlamudi, ... arXiv preprint arXiv:1503.07241, 2015 | 392 | 2015 |
A study of BFLOAT16 for deep learning training D Kalamkar, D Mudigere, N Mellempudi, D Das, K Banerjee, S Avancha, ... arXiv preprint arXiv:1905.12322, 2019 | 320 | 2019 |
Out-of-distribution detection using an ensemble of self supervised leave-out classifiers A Vyas, N Jammalamadaka, X Zhu, D Das, B Kaul, TL Willke Proceedings of the European conference on computer vision (ECCV), 550-564, 2018 | 271 | 2018 |
Scaledeep: A scalable compute architecture for learning and evaluating deep networks S Venkataramani, A Ranjan, S Banerjee, D Das, S Avancha, ... Proceedings of the 44th Annual International Symposium on Computer …, 2017 | 270 | 2017 |
Reconfigurable interface-based electrical architecture D Das, VK Agrawal, S Rajappan US Patent 8,930,036, 2015 | 213 | 2015 |
Distributed deep learning using synchronous stochastic gradient descent D Das, S Avancha, D Mudigere, K Vaidynathan, S Sridharan, D Kalamkar, ... arXiv preprint arXiv:1602.06709, 2016 | 210 | 2016 |
Mixed precision training of convolutional neural networks using integer operations D Das, N Mellempudi, D Mudigere, D Kalamkar, S Avancha, K Banerjee, ... arXiv preprint arXiv:1802.00930, 2018 | 191 | 2018 |
Ternary neural networks with fine-grained quantization N Mellempudi, A Kundu, D Mudigere, D Das, B Kaul, P Dubey arXiv preprint arXiv:1705.01462, 2017 | 133 | 2017 |
Parallel efficient sparse matrix-matrix multiplication on multicore platforms MMA Patwary, NR Satish, N Sundaram, J Park, MJ Anderson, ... International Conference on High Performance Computing, 48-57, 2015 | 80 | 2015 |
Mixed precision training with 8-bit floating point N Mellempudi, S Srinivasan, D Das, B Kaul arXiv preprint arXiv:1905.12334, 2019 | 74 | 2019 |
Abstraction layers for scalable distributed machine learning DD Kalamkar, K Vaidyanathan, S Sridharan, D Das US Patent 11,094,029, 2021 | 69 | 2021 |
Communication optimizations for distributed machine learning S Sridharan, K Vaidyanathan, D Das, C Sakthivel, ME Smorkalov US Patent 11,270,201, 2022 | 64 | 2022 |
Apparatuses, methods, and systems for neural networks S Venkataramani, D Das, A Ranjan, S Banerjee, S Avancha, ... US Patent App. 16/317,497, 2019 | 56 | 2019 |
Improving concurrency and asynchrony in multithreaded MPI applications using software offloading K Vaidyanathan, DD Kalamkar, K Pamnany, JR Hammond, P Balaji, ... Proceedings of the International Conference for High Performance Computing …, 2015 | 54 | 2015 |
Hardware implemented point to point communication primitives for machine learning S Sridharan, K Vaidyanathan, D Das US Patent 11,488,008, 2022 | 53 | 2022 |
Dynamic precision management for integer deep learning primitives N Mellempudi, D Mudigere, D Das, S Sridharan US Patent 10,643,297, 2020 | 48 | 2020 |
Optimized compute hardware for machine learning operations D Das, R Gramunt, M Smelyanskiy, J Corbal, D Mudigere, NK Mellempudi, ... US Patent 10,776,699, 2020 | 47 | 2020 |
Scaling half-precision floating point tensors for training deep neural networks N Mellempudi, D Das US Patent 11,501,139, 2022 | 45 | 2022 |
X-mann: A crossbar based architecture for memory augmented neural networks A Ranjan, S Jain, JR Stevens, D Das, B Kaul, A Raghunathan Proceedings of the 56th Annual Design Automation Conference 2019, 1-6, 2019 | 42 | 2019 |