Large batch training of convolutional networks Y You, I Gitman, B Ginsburg arXiv preprint arXiv:1708.03888, 2017 | 1170* | 2017 |
Understanding the role of momentum in stochastic gradient methods I Gitman, H Lang, P Zhang, L Xiao Advances in Neural Information Processing Systems 32, 2019 | 105 | 2019 |
Comparison of batch normalization and weight normalization algorithms for the large-scale image classification I Gitman, B Ginsburg arXiv preprint arXiv:1709.08145, 2017 | 73 | 2017 |
Mixed-precision training for nlp and speech recognition with openseq2seq O Kuchaiev, B Ginsburg, I Gitman, V Lavrukhin, J Li, H Nguyen, C Case, ... arXiv preprint arXiv:1805.10387, 2018 | 46 | 2018 |
Openseq2seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models O Kuchaiev, B Ginsburg, I Gitman, V Lavrukhin, C Case, P Micikevicius Proceedings of Workshop for NLP Open Source Software (NLP-OSS), 41-46, 2018 | 40 | 2018 |
Large batch training of convolutional networks with layer-wise adaptive rate scaling B Ginsburg, I Gitman, Y You | 21 | 2018 |
Novel prediction techniques based on clusterwise linear regression I Gitman, J Chen, E Lei, A Dubrawski arXiv preprint arXiv:1804.10742, 2018 | 13 | 2018 |
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset S Toshniwal, I Moshkov, S Narenthiran, D Gitman, F Jia, I Gitman arXiv preprint arXiv:2402.10176, 2024 | 11 | 2024 |
Scaling SGD batch size to 32k for imagenet training. CoRR abs/1708.03888 (2017) Y You, I Gitman, B Ginsburg arXiv preprint arXiv:1708.03888, 2017 | 9 | 2017 |
Convergence analysis of gradient descent algorithms with proportional updates I Gitman, D Dilipkumar, B Parr arXiv preprint arXiv:1801.03137, 2018 | 5 | 2018 |
Confidence-based ensembles of end-to-end speech recognition models I Gitman, V Lavrukhin, A Laptev, B Ginsburg arXiv preprint arXiv:2306.15824, 2023 | 1 | 2023 |
Powerful and Extensible WFST Framework for Rnn-Transducer Losses A Laptev, V Bataev, I Gitman, B Ginsburg ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 1 | 2023 |
Nemotron-4 340B Technical Report B Adler, N Agarwal, A Aithal, DH Anh, P Bhattacharya, A Brundyn, ... arXiv preprint arXiv:2406.11704, 2024 | | 2024 |
Canonical Least Squares Clustering on Sparse Medical Data I Gitman, J Chen, A Dubrawski | | |