Efficientvit: Memory efficient vision transformer with cascaded group attention X Liu, H Peng, N Zheng, Y Yang, H Hu, Y Yuan Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 150 | 2023 |
Nn-meter: Towards accurate latency prediction of deep-learning model inference on diverse edge devices LL Zhang, S Han, J Wei, N Zheng, T Cao, Y Yang, Y Liu Proceedings of the 19th Annual International Conference on Mobile Systems …, 2021 | 106 | 2021 |
Enable simultaneous dnn services based on deterministic operator overlap and precise latency prediction W Cui, H Zhao, Q Chen, N Zheng, J Leng, J Zhao, Z Song, T Ma, Y Yang, ... Proceedings of the International Conference for High Performance Computing …, 2021 | 47 | 2021 |
{SparTA}:{Deep-Learning} Model Sparsity via {Tensor-with-Sparsity-Attribute} N Zheng, B Lin, Q Zhang, L Ma, Y Yang, F Yang, Y Wang, M Yang, L Zhou 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2022 | 34 | 2022 |
Online video super-resolution with convolutional kernel bypass grafts J Xiao, X Jiang, N Zheng, H Yang, Y Yang, Y Yang, D Li, KM Lam IEEE Transactions on Multimedia 25, 8972-8987, 2023 | 22 | 2023 |
Astraea: towards QoS-aware and resource-efficient multi-stage GPU services W Zhang, Q Chen, K Fu, N Zheng, Z Huang, J Leng, M Guo Proceedings of the 27th ACM International Conference on Architectural …, 2022 | 20 | 2022 |
Toward qos-awareness and improved utilization of spatial multitasking gpus W Zhang, Q Chen, N Zheng, W Cui, K Fu, M Guo IEEE Transactions on Computers 71 (4), 866-879, 2021 | 19 | 2021 |
URSA: Precise capacity planning and fair scheduling based on low-level statistics for public clouds W Zhang, N Zheng, Q Chen, Y Yang, Z Song, T Ma, J Leng, M Guo Proceedings of the 49th International Conference on Parallel Processing, 1-11, 2020 | 16 | 2020 |
Online video streaming super-resolution with adaptive look-up table fusion G Yin, X Jiang, S Jiang, Z Han, N Zheng, H Yang, D Bai, H Tan, S Sun, ... arXiv preprint arXiv:2303.00334 1, 2023 | 9 | 2023 |
Full-cycle energy consumption benchmark for low-carbon computer vision B Li, X Jiang, D Bai, Y Zhang, N Zheng, X Dong, L Liu, Y Yang, D Li arXiv preprint arXiv:2108.13465, 2021 | 9 | 2021 |
Pit: Optimization of dynamic sparse deep learning models via permutation invariant transformation N Zheng, H Jiang, Q Zhang, Z Han, L Ma, Y Yang, F Yang, C Zhang, L Qiu, ... Proceedings of the 29th Symposium on Operating Systems Principles, 331-347, 2023 | 8 | 2023 |
Optimizing dynamic neural networks with brainstorm W Cui, Z Han, L Ouyang, Y Wang, N Zheng, L Ma, Y Yang, F Yang, J Xue, ... 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 8 | 2023 |
CHARM: Collaborative host and accelerator resource management for gpu datacenters W Zhang, K Fu, N Zheng, Q Chen, C Li, W Zheng, M Guo 2021 IEEE 39th International Conference on Computer Design (ICCD), 307-315, 2021 | 8 | 2021 |
QoS-aware irregular collaborative inference for improving throughput of DNN services K Fu, J Shi, Q Chen, N Zheng, W Zhang, D Zeng, M Guo SC22: International Conference for High Performance Computing, Networking …, 2022 | 6 | 2022 |
Efficient gpu kernels for n: m-sparse weights in deep learning B Lin, N Zheng, L Wang, S Cao, L Ma, Q Zhang, Y Zhu, T Cao, J Xue, ... Proceedings of Machine Learning and Systems 5, 513-525, 2023 | 5 | 2023 |
nn-METER: Towards accurate latency prediction of DNN inference on diverse edge devices LL Zhang, S Han, J Wei, N Zheng, T Cao, Y Liu GetMobile: Mobile Computing and Communications 25 (4), 19-23, 2022 | 4 | 2022 |
Poster: Precise capacity planning for database public clouds N Zheng, Q Chen, Y Yang, J Li, W Zheng, M Guo 2019 28th International Conference on Parallel Architectures and Compilation …, 2019 | 4 | 2019 |
Online streaming video super-resolution with convolutional look-up table G Yin, Z Qu, X Jiang, S Jiang, Z Han, N Zheng, H Yang, X Liu, Y Yang, ... IEEE Transactions on Image Processing 33, 2305-2317, 2024 | 3 | 2024 |
Towards QoS-aware and resource-efficient GPU microservices based on spatial multitasking GPUs in datacenters W Zhang, Q Chen, K Fu, N Zheng, Z Huang, J Leng, C Li, W Zheng, ... arXiv preprint arXiv:2005.02088, 2020 | 3 | 2020 |
Spaceevo: Hardware-friendly search space design for efficient int8 inference X Wang, LL Zhang, J Xu, Q Zhang, Y Wang, Y Yang, N Zheng, T Cao, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 2 | 2023 |