Model-driven autotuning of sparse matrix-vector multiply on GPUs JW Choi, A Singh, RW Vuduc ACM sigplan notices 45 (5), 115-126, 2010 | 566 | 2010 |
A roofline model of energy JW Choi, D Bedard, R Fowler, R Vuduc 2013 IEEE 27th International Symposium on Parallel and Distributed …, 2013 | 196 | 2013 |
On the limits of GPU acceleration R Vuduc, A Chandramowlishwaran, J Choi, M Guney, A Shringarpure Proceedings of the 2nd USENIX conference on Hot topics in parallelism 13 (0), 2010 | 178 | 2010 |
FROSTT: The formidable repository of open sparse tensors and tools S Smith, JW Choi, J Li, R Vuduc, J Park, X Liu, G Karypis | 158 | 2017 |
Algorithmic time, energy, and power on candidate HPC compute building blocks J Choi, M Dukhan, X Liu, R Vuduc 2014 IEEE 28th international parallel and distributed processing symposium …, 2014 | 100 | 2014 |
Performance analysis and tuning for general purpose graphics processing units (GPGPU) H Kim, R Vuduc, S Baghsorkhi Morgan & Claypool Publishers, 2012 | 73 | 2012 |
Model-driven sparse CP decomposition for higher-order tensors J Li, J Choi, I Perros, J Sun, R Vuduc 2017 IEEE international parallel and distributed processing symposium (IPDPS …, 2017 | 66 | 2017 |
Sparse Matrix-Vector Multiplication on Multicore and Accelerators. S Williams, N Bell, JW Choi, M Garland, L Oliker, R Vu Scientific Computing with Multicore and Accelerators, 83-109, 2010 | 37 | 2010 |
Blocking optimization techniques for sparse tensor computation J Choi, X Liu, S Smith, T Simon 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2018 | 36 | 2018 |
On optimizing distributed tucker decomposition for dense tensors VT Chakaravarthy, JW Choi, DJ Joseph, X Liu, P Murali, Y Sabharwal, ... 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2017 | 34 | 2017 |
High-performance dense tucker decomposition on GPU clusters J Choi, X Liu, V Chakaravarthy SC18: International Conference for High Performance Computing, Networking …, 2018 | 32 | 2018 |
Alto: Adaptive linearized storage of sparse tensors AE Helal, J Laukemann, F Checconi, JJ Tithi, T Ranadive, F Petrini, ... Proceedings of the ACM International Conference on Supercomputing, 404-416, 2021 | 30 | 2021 |
A CPU: GPU hybrid implementation and model-driven scheduling of the fast multipole method J Choi, A Chandramowlishwaran, K Madduri, R Vuduc Proceedings of Workshop on General Purpose Processing Using GPUs, 64-71, 2014 | 29 | 2014 |
On optimizing distributed tucker decomposition for sparse tensors VT Chakaravarthy, JW Choi, DJ Joseph, P Murali, SS Pandian, ... Proceedings of the 2018 International Conference on Supercomputing, 374-384, 2018 | 26 | 2018 |
A brief history and introduction to GPGPU R Vuduc, J Choi Modern Accelerator Technologies for Geographic Information Science, 9-23, 2013 | 23 | 2013 |
High-performance lattice QCD for multi-core based parallel systems using a cache-friendly hybrid threaded-MPI approach M Smelyanskiy, K Vaidyanathan, J Choi, B Joó, J Chhugani, MA Clark, ... Proceedings of 2011 International Conference for High Performance Computing …, 2011 | 21 | 2011 |
Brief announcement: Towards a communication optimal fast multipole method and its implications at exascale A Chandramowlishwaran, JW Choi, K Madduri, R Vuduc Proceedings of the twenty-fourth annual ACM symposium on Parallelism in …, 2012 | 20 | 2012 |
How much (execution) time and energy does my algorithm cost? JW Choi, RW Vuduc XRDS: Crossroads, The ACM Magazine for Students 19 (3), 49-51, 2013 | 12 | 2013 |
Efficient, out-of-memory sparse MTTKRP on massively parallel architectures A Nguyen, AE Helal, F Checconi, J Laukemann, JJ Tithi, Y Soh, ... Proceedings of the 36th ACM International Conference on Supercomputing, 1-13, 2022 | 9 | 2022 |
Data analytics with nvlink: An spmv case study D Buono, F Artico, F Checconi, JW Choi, X Que, L Schneidenbach Proceedings of the Computing Frontiers Conference, 89-96, 2017 | 9 | 2017 |