Design, modeling, and evaluation of a scalable multi-level checkpointing system A Moody, G Bronevetsky, K Mohror, BR De Supinski SC'10: Proceedings of the 2010 ACM/IEEE International Conference for High …, 2010 | 827 | 2010 |
There goes the neighborhood: performance degradation due to nearby jobs A Bhatele, K Mohror, SH Langer, KE Isaacs Proceedings of the International Conference on High Performance Computing …, 2013 | 248 | 2013 |
Design and modeling of a non-blocking checkpointing system K Sato, N Maruyama, K Mohror, A Moody, T Gamblin, BR de Supinski, ... SC'12: Proceedings of the International Conference on High Performance …, 2012 | 146 | 2012 |
An ephemeral burst-buffer file system for scientific applications T Wang, K Mohror, A Moody, K Sato, W Yu SC'16: Proceedings of the International Conference for High Performance …, 2016 | 142 | 2016 |
MCREngine: A scalable checkpointing system using data-aware aggregation and compression TZ Islam, K Mohror, S Bagchi, A Moody, BR De Supinski, R Eigenmann SC'12: Proceedings of the International Conference on High Performance …, 2012 | 134 | 2012 |
A large-scale study of MPI usage in open-source HPC applications I Laguna, R Marshall, K Mohror, M Ruefenacht, A Skjellum, N Sultana Proceedings of the International Conference for High Performance Computing …, 2019 | 88 | 2019 |
Veloc: Towards high performance adaptive asynchronous checkpointing at large scale B Nicolae, A Moody, E Gonsiorowski, K Mohror, F Cappello 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2019 | 86 | 2019 |
The popper convention: Making reproducible systems evaluation practical I Jimenez, M Sevilla, N Watkins, C Maltzahn, J Lofstead, K Mohror, ... 2017 ieee international parallel and distributed processing symposium …, 2017 | 85 | 2017 |
A 1 PB/s file system to checkpoint three million MPI tasks R Rajachandrasekar, A Moody, K Mohror, DK Panda Proceedings of the 22nd international symposium on High-performance parallel …, 2013 | 81 | 2013 |
A user-level infiniband-based file system and checkpoint strategy for burst buffers K Sato, K Mohror, A Moody, T Gamblin, BR De Supinski, N Maruyama, ... 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2014 | 80 | 2014 |
Adapt: Algorithmic differentiation applied to floating-point precision tuning H Menon, MO Lam, D Osei-Kuffuor, M Schordan, S Lloyd, K Mohror, ... SC18: International Conference for High Performance Computing, Networking …, 2018 | 79 | 2018 |
Entropy-aware I/O pipelining for large-scale deep learning on HPC systems Y Zhu, F Chowdhury, H Fu, A Moody, K Mohror, K Sato, W Yu 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation …, 2018 | 70 | 2018 |
I/o characterization and performance evaluation of beegfs for deep learning F Chowdhury, Y Zhu, T Heer, S Paredes, A Moody, R Goldstone, ... Proceedings of the 48th International Conference on Parallel Processing, 1-10, 2019 | 67 | 2019 |
Evaluating and extending user-level fault tolerance in MPI applications I Laguna, DF Richards, T Gamblin, M Schulz, BR de Supinski, K Mohror, ... The International Journal of High Performance Computing Applications 30 (3 …, 2016 | 56 | 2016 |
Evaluating similarity-based trace reduction techniques for scalable performance analysis K Mohror, KL Karavanic Proceedings of the conference on high performance computing networking …, 2009 | 52 | 2009 |
Managing I/O interference in a shared burst buffer system S Thapaliya, P Bangalore, J Lofstead, K Mohror, A Moody 2016 45th International Conference on Parallel Processing (ICPP), 416-425, 2016 | 50 | 2016 |
Efficient user-level storage disaggregation for deep learning Y Zhu, W Yu, B Jiao, K Mohror, A Moody, F Chowdhury 2019 IEEE International Conference on Cluster Computing (CLUSTER), 1-12, 2019 | 41 | 2019 |
Ad hoc file systems for high-performance computing A Brinkmann, K Mohror, W Yu, P Carns, T Cortes, SA Klasky, A Miranda, ... Journal of Computer Science and Technology 35, 4-26, 2020 | 39 | 2020 |
Fmi: Fault tolerant messaging interface for fast and transparent recovery K Sato, A Moody, K Mohror, T Gamblin, BR de Supinski, N Maruyama, ... 2014 IEEE 28th International Parallel and Distributed Processing Symposium …, 2014 | 39 | 2014 |
Integrating database technology with comparison-based parallel performance diagnosis: The perftrack performance experiment management tool KL Karavanic, J May, K Mohror, B Miller, K Huck, R Knapp, B Pugh SC'05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 39-39, 2005 | 38 | 2005 |