Addressing failures in exascale computing M Snir, RW Wisniewski, JA Abraham, SV Adve, S Bagchi, P Balaji, J Belak, ... The International Journal of High Performance Computing Applications 28 (2 …, 2014 | 525 | 2014 |
CIFTS: A coordinated infrastructure for fault-tolerant systems R Gupta, P Beckman, BH Park, E Lusk, P Hargrove, A Geist, D Panda, ... 2009 International Conference on Parallel Processing, 237-245, 2009 | 104 | 2009 |
Co-analysis of RAS log and job log on Blue Gene/P Z Zheng, L Yu, W Tang, Z Lan, R Gupta, N Desai, S Coghlan, D Buettner 2011 IEEE international parallel & distributed processing symposium, 840-851, 2011 | 103 | 2011 |
Method and apparatus for utilizing closed captioned (CC) text keywords or phrases for the purpose of automated searching of network-based resources for interactive links to … V Shastri, A Arya, S Sampath, R Bharadwaj, P Gupta US Patent App. 09/727,837, 2001 | 89 | 2001 |
A practical failure prediction with location and lead time for blue gene/p Z Zheng, Z Lan, R Gupta, S Coghlan, P Beckman 2010 International Conference on Dependable Systems and Networks Workshops …, 2010 | 88 | 2010 |
Logaider: A tool for mining potential correlations of hpc log events S Di, R Gupta, M Snir, E Pershey, F Cappello 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2017 | 62 | 2017 |
MPI-aware networking infrastructure R Gupta, T Abels US Patent App. 11/147,783, 2006 | 58 | 2006 |
Planning considerations for job scheduling in HPC clusters S Iqbal, R Gupta, YC Fang High-Performance Computing, reprinted from Dell Power Solutions, 2005 | 52 | 2005 |
Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems P Balaji, R Gupta, A Vishnu, P Beckman Computer Science-Research and Development 26 (3), 247-256, 2011 | 47 | 2011 |
Evaluating power-monitoring capabilities on IBM Blue Gene/P and Blue Gene/Q K Yoshii, K Iskra, R Gupta, P Beckman, V Vishwanath, C Yu, S Coghlan 2012 IEEE International Conference on Cluster Computing, 36-44, 2012 | 37 | 2012 |
Efficient collective operations using remote memory operations on VIA-based clusters R Gupta, P Balaji, DK Panda, J Nieplocha Proceedings International Parallel and Distributed Processing Symposium, 9 pp., 2003 | 35 | 2003 |
Exascale workload characterization and architecture implications P Balaprakash, D Buntinas, A Chan, A Guha, R Gupta, SHK Narayanan, ... 2013 IEEE International Symposium on Performance Analysis of Systems and …, 2013 | 34 | 2013 |
System and method for intelligent information handling system cluster switches R Radhakrishnan, R Gupta US Patent App. 11/414,406, 2007 | 31 | 2007 |
Exploring properties and correlations of fatal events in a large-scale hpc system S Di, H Guo, R Gupta, ER Pershey, M Snir, F Cappello IEEE Transactions on Parallel and Distributed Systems 30 (2), 361-374, 2018 | 30 | 2018 |
Distributed monitoring and management of exascale systems in the Argo project S Perarnau, R Thakur, K Iskra, K Raffenetti, F Cappello, R Gupta, ... Distributed Applications and Interoperable Systems: 15th IFIP WG 6.1 …, 2015 | 29 | 2015 |
Efficient barrier using remote memory operations on VIA-based clusters R Gupta, V Tipparaju, J Nieplocha, D Panda Proceedings. IEEE International Conference on Cluster Computing, 83-90, 2002 | 27 | 2002 |
Systemwide power management with Argo D Ellsworth, T Patki, S Perarnau, S Seo, A Amer, J Zounmevo, R Gupta, ... 2016 IEEE International Parallel and Distributed Processing Symposium …, 2016 | 25 | 2016 |
System and method for push-push cable connection R Gupta, R Pepper US Patent 7,156,683, 2007 | 25 | 2007 |
La VALSE: Scalable Log Visualization for Fault Characterization in Supercomputers. H Guo, S Di, R Gupta, T Peterka, F Cappello EGPGV@ EuroVis, 91-100, 2018 | 21 | 2018 |
Analyzing checkpointing trends for applications on the IBM Blue Gene/P system HG Naik, R Gupta, P Beckman 2009 International Conference on Parallel Processing Workshops, 81-88, 2009 | 20 | 2009 |