Deep image: Scaling up image recognition R Wu, S Yan, Y Shan, Q Dang, G Sun arXiv preprint arXiv:1501.02876, 2015 | 523 | 2015 |
Evaluating fast algorithms for convolutional neural networks on FPGAs L Lu, Y Liang, Q Xiao, S Yan 2017 IEEE 25th annual international symposium on field-programmable custom …, 2017 | 281 | 2017 |
Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs Q Xiao, Y Liang, L Lu, S Yan, YW Tai Proceedings of the 54th Annual Design Automation Conference 2017, 1-6, 2017 | 225 | 2017 |
yaSpMV: Yet another SpMV framework on GPUs S Yan, C Li, Y Zhang, H Zhou Acm Sigplan Notices 49 (8), 107-118, 2014 | 179 | 2014 |
Evaluating fast algorithms for convolutional neural networks on FPGAs Y Liang, L Lu, Q Xiao, S Yan IEEE Transactions on Computer-Aided Design of Integrated Circuits and …, 2019 | 148 | 2019 |
StreamScan: fast scan algorithms for GPUs without global barrier synchronization S Yan, G Long, Y Zhang Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of …, 2013 | 121 | 2013 |
Characterization and prediction of deep learning workloads in large-scale gpu datacenters Q Hu, P Sun, S Yan, Y Wen, T Zhang Proceedings of the International Conference for High Performance Computing …, 2021 | 98 | 2021 |
Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes P Sun, W Feng, R Han, S Yan, Y Wen arXiv preprint arXiv:1902.06855, 2019 | 78 | 2019 |
A coordinated tiling and batching framework for efficient GEMM on GPUs X Li, Y Liang, S Yan, L Jia, Y Li Proceedings of the 24th symposium on principles and practice of parallel …, 2019 | 59 | 2019 |
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach P Sun, Y Wen, NBD Ta, S Yan 2017 IEEE International Conference on Smart Computing (SMARTCOMP), 1-6, 2017 | 46 | 2017 |
GPURoofline: a model for guiding performance optimizations on GPUs H Jia, Y Zhang, G Long, J Xu, S Yan, Y Li Euro-Par 2012 Parallel Processing: 18th International Conference, Euro-Par …, 2012 | 43 | 2012 |
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu, B Wu, X Li, S Yan, Y Liang Proceedings of the 49th Annual International Symposium on Computer …, 2022 | 41 | 2022 |
Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs C Li, Y Yang, H Dai, S Yan, F Mueller, H Zhou 2014 IEEE International Symposium on Performance Analysis of Systems and …, 2014 | 41 | 2014 |
Diesel: A dataset-based distributed storage and caching system for large-scale deep learning training L Wang, S Ye, B Yang, Y Lu, H Zhang, S Yan, Q Luo Proceedings of the 49th International Conference on Parallel Processing, 1-11, 2020 | 30 | 2020 |
Gradientflow: Optimizing network performance for large-scale distributed dnn training P Sun, Y Wen, R Han, W Feng, S Yan IEEE Transactions on Big Data 8 (2), 495-507, 2019 | 29 | 2019 |
Parallelization and performance optimization on face detection algorithm with OpenCL: A case study W Wang, Y Zhang, S Yan, Y Zhang, H Jia Tsinghua Science and Technology 17 (3), 287-295, 2012 | 24 | 2012 |
Enabling efficient fast convolution algorithms on GPUs via MegaKernels L Jia, Y Liang, X Li, L Lu, S Yan IEEE Transactions on Computers 69 (7), 986-997, 2020 | 20 | 2020 |
Timed dataflow: Reducing communication overhead for distributed machine learning systems P Sun, Y Wen, TNB Duong, S Yan 2016 IEEE 22nd International Conference on Parallel and Distributed Systems …, 2016 | 19 | 2016 |
Elan: Towards generic and efficient elastic training for deep learning L Xie, J Zhai, B Wu, Y Wang, X Zhang, P Sun, S Yan 2020 IEEE 40th International Conference on Distributed Computing Systems …, 2020 | 16 | 2020 |
A cross-platform SpMV framework on many-core architectures Y Zhang, S Li, S Yan, H Zhou ACM Transactions on Architecture and Code Optimization (TACO) 13 (4), 1-25, 2016 | 16 | 2016 |