Deep Learning Recommendation Model for Personalization and Recommendation Systems M Naumov, D Mudigere, HJM Shi, J Huang, N Sundaraman, J Park, ... arXiv preprint arXiv:1906.00091, 2019 | 677 | 2019 |
A Study of BFLOAT16 for Deep Learning Training D Kalamkar, D Mudigere, N Mellempudi, D Das, K Banerjee, S Avancha, ... arXiv preprint arXiv:1905.12322, 2019 | 317 | 2019 |
Strassen's algorithm reloaded J Huang, TM Smith, GM Henry, RA van de Geijn High Performance Computing, Networking, Storage and Analysis, SC16 …, 2016 | 84 | 2016 |
Software-hardware co-design for fast and scalable training of deep learning recommendation models D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ... Proceedings of the 49th Annual International Symposium on Computer …, 2022 | 82 | 2022 |
Performance optimization for the k-nearest neighbors kernel on x86 architectures CD Yu, J Huang, W Austin, B Xiao, G Biros Proceedings of the International Conference for High Performance Computing …, 2015 | 44 | 2015 |
FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference D Khudia, J Huang, P Basu, S Deng, H Liu, J Park, M Smelyanskiy arXiv preprint arXiv:2101.05615, 0 | 40 | |
Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Ajit Mathews, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, and Vijay Rao. 2021. Software-Hardware Co-design … D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ... arXiv preprint arXiv:2104.05158, 2022 | 38* | 2022 |
Generating families of practical fast matrix multiplication algorithms J Huang, L Rice, DA Matthews, RA van de Geijn 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2017 | 36 | 2017 |
Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR abs/1906.00091 (2019) M Naumov, D Mudigere, HJM Shi, J Huang, N Sundaraman, J Park, ... arXiv preprint arXiv:1906.00091, 2019 | 35* | 2019 |
High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models D Mudigere, Y Hao, J Huang, A Tulloch, S Sridharan, X Liu, M Ozdal, ... arXiv preprint arXiv:2104.05158, 2021 | 33 | 2021 |
Mixed-Precision Embedding Using a Cache JA Yang, J Huang, J Park, PTP Tang, A Tulloch arXiv preprint arXiv:2010.11305, 2020 | 23 | 2020 |
Implementing Strassen's Algorithm with CUTLASS on NVIDIA Volta GPUs J Huang, CD Yu, RA van de Geijn arXiv preprint arXiv:1808.07984, 2018 | 22 | 2018 |
Strassen's Algorithm for Tensor Contraction J Huang, DA Matthews, RA van de Geijn SIAM Journal on Scientific Computing 40 (3), C305-C326, 2018 | 21 | 2018 |
Strassen’s Algorithm Reloaded on GPUs J Huang, CD Yu, RA Geijn ACM Transactions on Mathematical Software (TOMS) 46 (1), 1-22, 2020 | 20 | 2020 |
BLISlab: A Sandbox for Optimizing GEMM J Huang, RA van de Geijn arXiv preprint arXiv:1609.00076, 2016 | 15 | 2016 |
Efficient soft-error detection for low-precision deep learning recommendation models S Li, J Huang, PTP Tang, D Khudia, J Park, HD Dixit, Z Chen 2022 IEEE International Conference on Big Data (Big Data), 1556-1563, 2022 | 13 | 2022 |
A study of BFLOAT16 for deep learning training (2019) D Kalamkar, D Mudigere, N Mellempudi, D Das, K Banerjee, S Avancha, ... arXiv preprint arXiv:1905.12322, 1905 | 13 | 1905 |
Low-precision hardware architectures meet recommendation model inference at scale Z Deng, J Park, PTP Tang, H Liu, J Yang, H Yuen, J Huang, D Khudia, ... IEEE Micro 41 (5), 93-100, 2021 | 10 | 2021 |
Implementing Strassen’s Algorithm with BLIS FW Note, J Huang, TM Smith, GM Henry, RA van de Geijn arXiv preprint arXiv:1605.01078, 2016 | 10* | 2016 |
Practical fast matrix multiplication algorithms J Huang The University of Texas at Austin, 2018 | 6 | 2018 |