Architecture and performance of Devito, a system for automated stencil computation F Luporini, M Louboutin, M Lange, N Kukreja, P Witte, J Hückelheim, ... ACM Transactions on Mathematical Software (TOMS) 46 (1), 1-28, 2020 | 145 | 2020 |
Vector friendly instruction format and execution thereof RC Valentine, JC San Adrian, RE Sans, RD Cavin, BL Toll, SG Duran, ... US Patent App. 13/976,707, 2013 | 113 | 2013 |
Yask—yet another stencil kernel: A framework for hpc stencil code-generation and tuning C Yount, J Tobin, A Breuer, A Duran 2016 Sixth International Workshop on Domain-Specific Languages and High …, 2016 | 77 | 2016 |
Using model trees for computer architecture performance analysis of software applications EM Ould-Ahmed-Vall, J Woodlee, C Yount, KA Doshi, S Abraham 2007 IEEE International Symposium on Performance Analysis of Systems …, 2007 | 75 | 2007 |
A methodology for the rapid injection of transient hardware errors CR Yount, DP Siewiorek IEEE Transactions on Computers 45 (8), 881-891, 1996 | 69 | 1996 |
Vector Folding: improving stencil performance via multi-dimensional SIMD-vector representation C Yount 2015 IEEE 17th international conference on high performance computing and …, 2015 | 53 | 2015 |
Characterization and optimization methodology applied to stencil computations C Andreolli, P Thierry, L Borges, G Skinner, C Yount, J Jeffers, J Reinders High Performance Parallelism Pearls, 377-396, 2015 | 43 | 2015 |
Gather-op instruction to duplicate a mask and perform an operation on vector elements gathered via tracked offset-based gathering E Ould-Ahmed-Vall, KA Doshi, CR Yount, S Sair US Patent 9,747,101, 2017 | 38 | 2017 |
Instruction and logic to provide vector horizontal majority voting functionality E Ould-Ahmed-Vall, KA Doshi, S Sair, CR Yount US Patent 9,448,794, 2016 | 30 | 2016 |
Instruction and logic to provide stride-based vector load-op functionality with mask duplication E Ould-Ahmed-Vall, KA Doshi, S Sair, CR Yount US Patent 9,804,844, 2017 | 29 | 2017 |
Efficient zero-based decompression E Ould-Ahmed-Vall, S Sair, KA Doshi, CR Yount, BL Toll US Patent 9,575,757, 2017 | 28 | 2017 |
Effective use of large high-bandwidth memory caches in HPC stencil computation via temporal wave-front tiling C Yount, A Duran 2016 7th International Workshop on Performance Modeling, Benchmarking and …, 2016 | 28 | 2016 |
Method to assess energy efficiency of HPC system operated with and without power constraints D Bodas, M Arunachalam, I Sharapov, CR Yount, SB Huck, R Huggahalli, ... US Patent 9,971,391, 2018 | 21 | 2018 |
Characterization of SPEC CPU2006 and SPEC OMP2001: Regression models and their transferability EM Ould-Ahmed-Vall, KA Doshi, C Yount, J Woodlee ISPASS 2008-IEEE International Symposium on Performance Analysis of Systems …, 2008 | 20 | 2008 |
Multi-level spatial and temporal tiling for efficient HPC stencil computation on many-core processors with large shared caches C Yount, A Duran, J Tobin Future Generation Computer Systems 92, 903-919, 2019 | 19 | 2019 |
Vector friendly instruction format and execution thereof RC Valentine, JC San Adrian, RE Sans, RD Cavin, BL Toll, SG Duran, ... US Patent 9,513,917, 2016 | 16 | 2016 |
Genetic algorithm based auto-tuning of seismic applications on multi and manycore computers C Andreolli, P Thierry, L Borges, C Yount, G Skinner EAGE workshop on high performance computing for upstream, cp-426-00017, 2014 | 16 | 2014 |
Accelerating seismic simulations using the intel xeon phi knights landing processor J Tobin, A Breuer, A Heinecke, C Yount, Y Cui International Conference on High Performance Computing, 139-157, 2017 | 14 | 2017 |
Graph-matching-based simulation-region selection for multiple binaries C Yount, H Patil, MS Islam, A Srikanth 2015 IEEE International Symposium on Performance Analysis of Systems and …, 2015 | 14 | 2015 |
Instructions and Logic for Load-Indices-and-Prefetch-Gathers Operations CR Yount, AC Valles, IM Gokhale, E Ould-Ahmed-Vall US Patent App. 14/977,356, 2017 | 12 | 2017 |