Understanding the GPU microarchitecture to achieve bare-metal performance tuning X Zhang, G Tan, S Xue, J Li, K Zhou, M Chen Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of …, 2017 | 67 | 2017 |
Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation J Ansel, E Yang, H He, N Gimelshein, A Jain, M Voznesensky, B Bao, ... Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 31 | 2024 |
Tools for top-down performance analysis of GPU-accelerated applications K Zhou, M Krentel, J Mellor-Crummey Proceedings of the 34th ACM International Conference on Supercomputing 26, 1–12, 2020 | 28 | 2020 |
A performance analysis framework for exploiting GPU microarchitectural capability K Zhou, G Tan, X Zhang, C Wang, N Sun Proceedings of the International Conference on Supercomputing, 1-10, 2017 | 21 | 2017 |
Multi-classes feature engineering with sliding window for purchase prediction in mobile commerce Q Li, M Gu, K Zhou, X Sun 2015 IEEE International Conference on Data Mining Workshop (ICDMW), 1048-1054, 2015 | 20 | 2015 |
GPA: A GPU Performance Advisor Based on Instruction Sampling K Zhou, X Meng, R Sai, J Mellor-Crummey 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2021 | 19 | 2021 |
GVProf: A value profiler for GPU-based clusters K Zhou, Y Hao, J Mellor-Crummey, X Meng, X Liu SC20: International Conference for High Performance Computing, Networking …, 2020 | 19 | 2020 |
Accelerating high‐order stencils on GPUs R Sai, J Mellor-Crummey, X Meng, K Zhou, M Araya-Polo, J Meng Concurrency and Computation: Practice and Experience 34 (20), 2021 | 17 | 2021 |
Measurement and analysis of GPU-accelerated applications with HPCToolkit K Zhou, L Adhianto, J Anderson, A Cherian, D Grubisic, M Krentel, Y Liu, ... Parallel Computing 108, 102837, 2021 | 12 | 2021 |
An automated tool for analysis and tuning of gpu-accelerated code in hpc applications K Zhou, X Meng, R Sai, D Grubisic, J Mellor-Crummey IEEE Transactions on Parallel and Distributed Systems 33 (4), 854-865, 2021 | 11 | 2021 |
Outcomes of openMP hackathon: openMP application experiences with the offloading model (part II) B Chapman, B Pham, C Yang, C Daley, C Bertoni, D Kulkarni, ... OpenMP: Enabling Massive Node-Level Parallelism: 17th International Workshop …, 2021 | 11 | 2021 |
ValueExpert: Exploring value patterns in GPU-Accelerated applications K Zhou, Y Hao, J Mellor-Crummey, X Meng, X Liu Proceedings of the 27th ACM International Conference on Architectural …, 2022 | 10 | 2022 |
Outcomes of OpenMP Hackathon: OpenMP Application Experiences with the Offloading Mode S Pophale, D Oryspayev, B Chapman, B Pham, C Yang, C Daley, ... Brookhaven National Lab.(BNL), Upton, NY (United States), 2021 | 6 | 2021 |
Quadboost: A scalable concurrent quadtree K Zhou, G Tan, W Zhou IEEE Transactions on Parallel and Distributed Systems 29 (3), 673-686, 2017 | 6 | 2017 |
基于并发跳表的云数据处理双层索引架构研究 周维, 路劲, 周可人, 王世普, 姚绍文 计算机研究与发展 52 (7), 1531-1545, 2015 | 6 | 2015 |
Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs AT Cherian, K Zhou, D Grubisic, X Meng, J Mellor-Crummey 2021 IEEE/ACM International Workshop on Programming and Performance …, 2021 | 4 | 2021 |
DrGPUM: Guiding Memory Optimization for GPU-Accelerated Applications M Lin, K Zhou, P Su Proceedings of the 28th ACM International Conference on Architectural …, 2023 | 3 | 2023 |
Low overhead and context sensitive profiling of gpu-accelerated applications K Zhou, J Anderson, X Meng, J Mellor-Crummey Proceedings of the 36th ACM International Conference on Supercomputing, 1-13, 2022 | 3 | 2022 |
Semi-supervised learning for shale image segmentation with fast normalized cut loss B Yin, Q Hu, Y Zhu, K Zhou Geoenergy Science and Engineering 229, 212039, 2023 | 2 | 2023 |
Hardware-aware compression with random operation access specific tile (ROAST) hashing A Desai, K Zhou, A Shrivastava International Conference on Machine Learning, 7732-7749, 2023 | 2 | 2023 |