Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity S Cao, C Zhang, Z Yao, W Xiao, L Nie, D Zhan, Y Liu, M Wu, L Zhang Proceedings of the 2019 ACM/SIGDA International Symposium on Field …, 2019 | 194 | 2019 |
Balanced sparsity for efficient dnn inference on gpu Z Yao, S Cao, W Xiao, C Zhang, L Nie Proceedings of the AAAI conference on artificial intelligence 33 (01), 5676-5683, 2019 | 119 | 2019 |
Seernet: Predicting convolutional neural network feature-map sparsity through low-bit quantization S Cao, L Ma, W Xiao, C Zhang, Y Liu, L Zhang, L Nie, Z Yang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019 | 83 | 2019 |
Dense-to-sparse gate for mixture-of-experts X Nie, S Cao, X Miao, L Ma, J Xue, Y Miao, Z Yang, Z Yang, CUI Bin | 22 | 2021 |
Evomoe: An evolutional mixture-of-experts training framework via dense-to-sparse gate X Nie, X Miao, S Cao, L Ma, Q Liu, J Xue, Y Miao, Y Liu, Z Yang, B Cui arXiv preprint arXiv:2112.14397, 2021 | 21 | 2021 |
Integer or floating point? new outlooks for low-bit quantization on large language models Y Zhang, L Zhao, S Cao, W Wang, T Cao, F Yang, M Yang, S Zhang, N Xu arXiv preprint arXiv:2305.12356, 2023 | 12 | 2023 |
Efficient gpu kernels for n: m-sparse weights in deep learning B Lin, N Zheng, L Wang, S Cao, L Ma, Q Zhang, Y Zhu, T Cao, J Xue, ... Proceedings of Machine Learning and Systems 5, 513-525, 2023 | 5 | 2023 |
Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference R Hwang, J Wei, S Cao, C Hwang, X Tang, T Cao, M Yang, M Rhu arXiv preprint arXiv:2308.12066, 2023 | 4 | 2023 |
Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation D Du, Y Zhang, S Cao, J Guo, T Cao, X Chu, N Xu arXiv preprint arXiv:2402.10631, 2024 | 3 | 2024 |
Accurate and structured pruning for efficient automatic speech recognition H Jiang, LL Zhang, Y Li, Y Wu, S Cao, T Cao, Y Yang, J Li, M Yang, L Qiu arXiv preprint arXiv:2305.19549, 2023 | 3 | 2023 |
Afpq: Asymmetric floating point quantization for llms Y Zhang, S Zhang, S Cao, D Du, J Wei, T Cao, N Xu arXiv preprint arXiv:2311.01792, 2023 | 2 | 2023 |
Nn-stretch: Automatic neural network branching for parallel inference on heterogeneous multi-processors J Wei, T Cao, S Cao, S Jiang, S Fu, M Yang, Y Zhang, Y Liu Proceedings of the 21st Annual International Conference on Mobile Systems …, 2023 | 2 | 2023 |
Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training Y Zhang, Y Han, S Cao, G Dai, Y Miao, T Cao, F Yang, N Xu arXiv preprint arXiv:2305.19982, 2023 | 1 | 2023 |
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge J Wei, S Cao, T Cao, L Ma, L Wang, Y Zhang, M Yang arXiv preprint arXiv:2407.00088, 2024 | | 2024 |
FlexSaaS: A Reconfigurable Accelerator for Web Search Selection S Cao, L Nie, D Zhan, W Wang, N Xu, R Das, M Wu, L Zhang, D Chiou ACM Transactions on Reconfigurable Technology and Systems (TRETS) 12 (1), 1-20, 2019 | | 2019 |
Bitter: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation L Wang, L Ma, S Cao, Q Zhang, J Xue, Y Shi, N Zheng, Z Miao, F Yang, ... | | |
The Case for Learning Machine Language G Liu, CJM Liang, S Cao, S Lu, L van Doorn | | |