The landscape of parallel computing research: A view from Berkeley K Asanovic, R Bodik, BC Catanzaro, JJ Gebis, P Husbands, K Keutzer, ... Technical Report UCB/EECS-2006-183, EECS Department, University of …, 2006 | 3176 | 2006 |
Roofline: an insightful visual performance model for multicore architectures S Williams, A Waterman, D Patterson Communications of the ACM 52 (4), 65-76, 2009 | 3019 | 2009 |
Optimization of sparse matrix-vector multiplication on emerging multicore platforms S Williams, L Oliker, R Vuduc, J Shalf, K Yelick, J Demmel Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, 1-12, 2007 | 1064 | 2007 |
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures K Datta, M Murphy, V Volkov, S Williams, J Carter, L Oliker, D Patterson, ... SC'08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, 1-12, 2008 | 830 | 2008 |
The potential of the cell processor for scientific computing S Williams, J Shalf, L Oliker, S Kamil, P Husbands, K Yelick Proceedings of the 3rd Conference on Computing Frontiers, 9-20, 2006 | 491 | 2006 |
AMReX: a framework for block-structured adaptive mesh refinement W Zhang, A Almgren, V Beckner, J Bell, J Blaschke, C Chan, M Day, ... The Journal of Open Source Software 4 (37), 1370, 2019 | 350 | 2019 |
Optimization and performance modeling of stencil computations on modern microprocessors K Datta, S Kamil, S Williams, L Oliker, J Shalf, K Yelick SIAM review 51 (1), 129-159, 2009 | 318 | 2009 |
An auto-tuning framework for parallel multicore stencil computations S Kamil, C Chan, L Oliker, J Shalf, S Williams 2010 IEEE international symposium on parallel & distributed processing …, 2010 | 296 | 2010 |
Implicit and explicit optimizations for stencil computations S Kamil, K Datta, S Williams, L Oliker, J Shalf, K Yelick Proceedings of the 2006 workshop on Memory system performance and …, 2006 | 197 | 2006 |
An efficient multicore implementation of a novel HSS-structured multifrontal solver using randomized sampling P Ghysels, XS Li, FH Rouet, S Williams, A Napov SIAM Journal on Scientific Computing 38 (5), S358-S384, 2016 | 171 | 2016 |
Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication A Buluç, S Williams, L Oliker, J Demmel 2011 IEEE International Parallel & Distributed Processing Symposium, 721-733, 2011 | 167 | 2011 |
Lattice Boltzmann simulation optimization on leading multicore platforms S Williams, J Carter, L Oliker, J Shalf, K Yelick 2008 IEEE International Symposium on Parallel and Distributed Processing, 1-14, 2008 | 148 | 2008 |
Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis YJ Lo, S Williams, B Van Straalen, TJ Ligocki, MJ Cordery, NJ Wright, ... Performance Modeling, Benchmarking and Simulation of High Performance …, 2014 | 144 | 2014 |
Auto-tuning performance on multicore computers SW Williams University of California, Berkeley, 2008 | 142 | 2008 |
Scientific computing kernels on the cell processor S Williams, J Shalf, L Oliker, S Kamil, P Husbands, K Yelick International Journal of Parallel Programming 35 (3), 263-298, 2007 | 137 | 2007 |
Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication A Azad, G Ballard, A Buluc, J Demmel, L Grigori, O Schwartz, S Toledo, ... SIAM Journal on Scientific Computing 38 (6), C624-C651, 2016 | 128 | 2016 |
Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations HM Aktulga, A Buluç, S Williams, C Yang 2014 IEEE 28th International Parallel and Distributed Processing Symposium …, 2014 | 112 | 2014 |
Optimization of geometric multigrid for emerging multi-and manycore processors S Williams, DD Kalamkar, A Singh, AM Deshpande, B Van Straalen, ... SC'12: Proceedings of the International Conference on High Performance …, 2012 | 91 | 2012 |
Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures A Chandramowlishwaran, S Williams, L Oliker, I Lashuk, G Biros, R Vuduc 2010 IEEE International Symposium on Parallel & Distributed Processing …, 2010 | 89 | 2010 |
Applying the roofline performance model to the intel xeon phi knights landing processor D Doerfler, J Deslippe, S Williams, L Oliker, B Cook, T Kurth, M Lobet, ... High Performance Computing: ISC High Performance 2016 International …, 2016 | 82 | 2016 |