Superneurons: Dynamic GPU memory management for training deep neural networks L Wang, J Ye, Y Zhao, W Wu, A Li, SL Song, Z Xu, T Kraska Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of …, 2018 | 274 | 2018 |
AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing T Geng, A Li, R Shi, C Wu, T Wang, Y Li, P Haghi, A Tumeo, S Che, ... 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture …, 2020 | 256 | 2020 |
Evaluating modern gpu interconnect: Pcie, nvlink, nv-sli, nvswitch and gpudirect A Li, SL Song, J Chen, J Li, X Liu, NR Tallent, KJ Barker IEEE Transactions on Parallel and Distributed Systems 31 (1), 94-110, 2019 | 244 | 2019 |
Qasmbench: A low-level quantum benchmark suite for nisq evaluation and simulation A Li, S Stein, S Krishnamoorthy, J Ang ACM Transactions on Quantum Computing 4 (2), 1-26, 2023 | 140* | 2023 |
A synchronization-free algorithm for parallel sparse triangular solves W Liu, A Li, J Hogg, IS Duff, B Vinter Euro-Par 2016: Parallel Processing: 22nd International Conference on …, 2016 | 107 | 2016 |
Adaptive and transparent cache bypassing for GPUs A Li, GJ van den Braak, A Kumar, H Corporaal Proceedings of the International Conference for High Performance Computing …, 2015 | 94 | 2015 |
I-GCN: A graph convolutional network accelerator with runtime locality enhancement through islandization T Geng, C Wu, Y Zhang, C Tan, C Xie, H You, M Herbordt, Y Lin, A Li MICRO-54: 54th annual IEEE/ACM international symposium on microarchitecture …, 2021 | 92 | 2021 |
Locality-aware CTA clustering for modern GPUs A Li, SL Song, W Liu, X Liu, A Kumar, H Corporaal ACM SIGARCH Computer Architecture News 45 (1), 297-311, 2017 | 90 | 2017 |
Qugan: A quantum state fidelity based generative adversarial network SA Stein, B Baheri, D Chen, Y Mao, Q Guan, A Li, B Fang, S Xu 2021 IEEE International Conference on Quantum Computing and Engineering (QCE …, 2021 | 79* | 2021 |
Accelerating transformer-based deep learning models on fpgas using column balanced block pruning H Peng, S Huang, T Geng, A Li, W Jiang, H Liu, S Wang, C Ding 2021 22nd International Symposium on Quality Electronic Design (ISQED), 142-148, 2021 | 77 | 2021 |
Tartan: evaluating modern GPU interconnect via a multi-GPU benchmark suite A Li, SL Song, J Chen, X Liu, N Tallent, K Barker 2018 IEEE International Symposium on Workload Characterization (IISWC), 191-202, 2018 | 63 | 2018 |
Fine-grained synchronizations and dataflow programming on GPUs A Li, GJ van den Braak, H Corporaal, A Kumar Proceedings of the 29th ACM on International Conference on Supercomputing …, 2015 | 61 | 2015 |
Bns-gcn: Efficient full-graph training of graph convolutional networks with partition-parallelism and random boundary node sampling C Wan, Y Li, A Li, NS Kim, Y Lin Proceedings of Machine Learning and Systems 4, 673-693, 2022 | 57 | 2022 |
FPDeep: Scalable acceleration of CNN training on deeply-pipelined FPGA clusters T Wang, T Geng, A Li, X Jin, M Herbordt IEEE Transactions on Computers 69 (8), 1143-1158, 2020 | 57* | 2020 |
Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels A Li, W Liu, MRB Kristensen, B Vinter, H Wang, K Hou, A Marquez, ... Proceedings of the International Conference for High Performance Computing …, 2017 | 56 | 2017 |
Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides W Liu, A Li, JD Hogg, IS Duff, B Vinter Concurrency and Computation: Practice and Experience 29 (21), e4244, 2017 | 56 | 2017 |
Quclassi: A hybrid deep neural network architecture based on quantum state fidelity SA Stein, B Baheri, D Chen, Y Mao, Q Guan, A Li, S Xu, C Ding Proceedings of Machine Learning and Systems 4, 251-264, 2022 | 52 | 2022 |
Cudaadvisor: Llvm-based runtime profiling for modern gpus D Shen, SL Song, A Li, X Liu Proceedings of the 2018 International Symposium on Code Generation and …, 2018 | 51 | 2018 |
OpenCGRA: An open-source unified framework for modeling, testing, and evaluating CGRAs C Tan, C Xie, A Li, KJ Barker, A Tumeo 2020 IEEE 38th International Conference on Computer Design (ICCD), 381-388, 2020 | 47 | 2020 |
LP-BNN: Ultra-low-latency BNN inference with layer parallelism T Geng, T Wang, C Wu, C Yang, SL Song, A Li, M Herbordt 2019 IEEE 30th International Conference on Application-specific Systems …, 2019 | 45 | 2019 |