Understanding GPU Memory Corruption at Extreme Scale: The Summit Case Study

V Oles, A Schmedding, G Ostrouchov, W Shin… - Proceedings of the 38th …, 2024 - dl.acm.org
GPU memory corruption and in particular double-bit errors (DBEs) remain one of the least
understood aspects of HPC system reliability. Albeit rare, their occurrences always lead to …

[引用][C] Comments on “specifying prior distributions in reliability applications” by Tian, Lewis‐Beck, Niemi, and Meeker

J Min, Z Lin, Y Hong - Applied Stochastic Models in Business …, 2024 - Wiley Online Library
The authors are to be commended for their important contributions to the area of Bayesian
reliability analysis, especially on setting priors for reliability models. In the following …