Improving GPGPU concurrency with elastic kernels S Pai, MJ Thazhuthaveetil, R Govindarajan ACM SIGARCH Computer Architecture News 41 (1), 407-418, 2013 | 281 | 2013 |
Groute: An asynchronous multi-GPU programming model for irregular computations T Ben-Nun, M Sutton, S Pai, K Pingali ACM SIGPLAN Notices 52 (8), 235-248, 2017 | 161 | 2017 |
A compiler for throughput optimization of graph algorithms on GPUs S Pai, K Pingali Proceedings of the 2016 ACM SIGPLAN International Conference on Object …, 2016 | 119 | 2016 |
Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme S Pai, R Govindarajan, MJ Thazhuthaveetil Proceedings of the 21st international conference on Parallel architectures …, 2012 | 66 | 2012 |
Controlled kernel launch for dynamic parallelism in GPUs X Tang, A Pattnaik, H Jiang, O Kayiran, A Jog, S Pai, M Ibrahim, ... 2017 IEEE International Symposium on High Performance Computer Architecture …, 2017 | 61 | 2017 |
Parallel triangle counting and k-truss identification using graph-centric methods C Voegele, YS Lu, S Pai, K Pingali 2017 IEEE High Performance Extreme Computing Conference (HPEC), 1-7, 2017 | 47 | 2017 |
Stochastic gradient descent on GPUs R Kaleem, S Pai, K Pingali Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 81-89, 2015 | 43 | 2015 |
Why gpus are slow at executing nfas and how to make them faster H Liu, S Pai, A Jog Proceedings of the Twenty-Fifth International Conference on Architectural …, 2020 | 28 | 2020 |
Locality analysis through static parallel sampling D Chen, F Liu, C Ding, S Pai ACM SIGPLAN Notices 53 (4), 557-570, 2018 | 26 | 2018 |
Architectural support for efficient large-scale automata processing H Liu, M Ibrahim, O Kayiran, S Pai, A Jog 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture …, 2018 | 24 | 2018 |
PLASMA: Portable programming for SIMD heterogeneous accelerators S Pai, R Govindarajan, MJ Thazhuthaveetil Workshop on Language, Compiler, and Architecture Support for GPGPU, held in …, 2010 | 24 | 2010 |
Bounded exhaustive test-input generation on GPUs A Celik, S Pai, S Khurshid, M Gligoric Proceedings of the ACM on Programming Languages 1 (OOPSLA), 1-25, 2017 | 21 | 2017 |
Synchronization trade-offs in gpu implementations of graph algorithms R Kaleem, A Venkat, S Pai, M Hall, K Pingali 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2016 | 21 | 2016 |
Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels S Pai, R Govindarajan, MJ Thazhuthaveetil Proceedings of the 23rd international conference on Parallel architectures …, 2014 | 20 | 2014 |
Groute: Asynchronous multi-GPU programming model with applications to large-scale graph processing T Ben-Nun, M Sutton, S Pai, K Pingali ACM Transactions on Parallel Computing (TOPC) 7 (3), 1-27, 2020 | 15 | 2020 |
One size doesn't fit all: Quantifying performance portability of graph applications on GPUs T Sorensen, S Pai, AF Donaldson 2019 IEEE International Symposium on Workload Characterization (IISWC), 155-166, 2019 | 10 | 2019 |
Efficient execution of graph algorithms on CPU with SIMD extensions R Zheng, S Pai 2021 IEEE/ACM International Symposium on Code Generation and Optimization …, 2021 | 9 | 2021 |
Horus: A modular GPU emulator framework AS Elhelw, S Pai 2020 IEEE International Symposium on Performance Analysis of Systems and …, 2020 | 6 | 2020 |
Adaptive work-efficient connected components on the GPU M Sutton, T Ben-Nun, A Barak, S Pai, K Pingali arXiv preprint arXiv:1612.01178, 2016 | 6 | 2016 |
Asynchronous Automata Processing on GPUs H Liu, S Pai, A Jog Proceedings of the ACM on Measurement and Analysis of Computing Systems 7 (1 …, 2023 | 3 | 2023 |