[HTML][HTML] Toward exascale resilience: 2014 update

F Cappello, G Al, W Gropp, S Kale, B Kramer… - … and Innovations: an …, 2014 - dl.acm.org
Resilience is a major roadblock for HPC executions on future exascale systems. These
systems will typically gather millions of CPU cores running up to a billion threads …

Supervision-by-registration: An unsupervised approach to improve the precision of facial landmark detectors

X Dong, SI Yu, X Weng, SE Wei… - Proceedings of the …, 2018 - openaccess.thecvf.com
In this paper, we present supervision-by-registration, an unsupervised approach to improve
the precision of facial landmark detectors on both images and video. Our key observation is …

Variability mitigation in nanometer CMOS integrated systems: A survey of techniques from circuits to software

A Rahimi, L Benini, RK Gupta - Proceedings of the IEEE, 2016 - ieeexplore.ieee.org
Variation in performance and power across manufactured parts and their operating
conditions is an accepted reality in modern microelectronic manufacturing processes with …

Using benchmarks for radiation testing of microprocessors and FPGAs

H Quinn, WH Robinson, P Rech… - IEEE transactions on …, 2015 - ieeexplore.ieee.org
Performance benchmarks have been used over the years to compare different systems.
These benchmarks can be useful for researchers trying to determine how changes to the …

Evaluating the impact of SDC on the GMRES iterative solver

J Elliott, M Hoemmen, F Mueller - 2014 ieee 28th international …, 2014 - ieeexplore.ieee.org
Increasing parallelism and transistor density, along with increasingly tighter energy and
peak power constraints, may force exposure of occasionally incorrect computation or …

/spl times/pipes Lite: a synthesis oriented design library for networks on chips

S Stergiou, F Angiolini, S Carta, L Raffo… - … Automation and Test …, 2005 - ieeexplore.ieee.org
The limited scalability of current bus topologies for systems on chips (SoCs) dictates the
adoption of networks on chips (NoCs) as a scalable interconnection scheme. Current SoCs …

Adaptive impact-driven detection of silent data corruption for HPC applications

S Di, F Cappello - IEEE Transactions on Parallel and …, 2016 - ieeexplore.ieee.org
For exascale HPC applications, silent data corruption (SDC) is one of the most dangerous
problems because there is no indication that there are errors during the execution. We …

Resilience design patterns: A structured approach to resilience at extreme scale

S Hukerikar, C Engelmann - arXiv preprint arXiv:1708.07422, 2017 - arxiv.org
Reliability is a serious concern for future extreme-scale high-performance computing (HPC)
systems. While the HPC community has developed various resilience solutions, the solution …

[HTML][HTML] The use of imprecise processing to improve accuracy in weather & climate prediction

PD Düben, H McNamara, TN Palmer - Journal of Computational Physics, 2014 - Elsevier
The use of stochastic processing hardware and low precision arithmetic in atmospheric
models is investigated. Stochastic processors allow hardware-induced faults in calculations …

RAFTing MapReduce: Fast recovery on the RAFT

JA Quiané-Ruiz, C Pinkel, J Schad… - 2011 IEEE 27th …, 2011 - ieeexplore.ieee.org
MapReduce is a computing paradigm that has gained a lot of popularity as it allows non-
expert users to easily run complex analytical tasks at very large-scale. At such scale, task …