Shengen Yan
Shengen Yan
在 ie.cuhk.edu.hk 的电子邮件经过验证
Deep image: Scaling up image recognition
R Wu, S Yan, Y Shan, Q Dang, G Sun
arXiv preprint arXiv:1501.02876, 2015
Evaluating fast algorithms for convolutional neural networks on FPGAs
L Lu, Y Liang, Q Xiao, S Yan
2017 IEEE 25th annual international symposium on field-programmable custom …, 2017
Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs
Q Xiao, Y Liang, L Lu, S Yan, YW Tai
Proceedings of the 54th Annual Design Automation Conference 2017, 1-6, 2017
yaSpMV: Yet another SpMV framework on GPUs
S Yan, C Li, Y Zhang, H Zhou
Acm Sigplan Notices 49 (8), 107-118, 2014
Evaluating fast algorithms for convolutional neural networks on FPGAs
Y Liang, L Lu, Q Xiao, S Yan
IEEE Transactions on Computer-Aided Design of Integrated Circuits and …, 2019
StreamScan: fast scan algorithms for GPUs without global barrier synchronization
S Yan, G Long, Y Zhang
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of …, 2013
Characterization and prediction of deep learning workloads in large-scale gpu datacenters
Q Hu, P Sun, S Yan, Y Wen, T Zhang
Proceedings of the International Conference for High Performance Computing …, 2021
Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes
P Sun, W Feng, R Han, S Yan, Y Wen
arXiv preprint arXiv:1902.06855, 2019
A coordinated tiling and batching framework for efficient GEMM on GPUs
X Li, Y Liang, S Yan, L Jia, Y Li
Proceedings of the 24th symposium on principles and practice of parallel …, 2019
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
P Sun, Y Wen, NBD Ta, S Yan
2017 IEEE International Conference on Smart Computing (SMARTCOMP), 1-6, 2017
GPURoofline: a model for guiding performance optimizations on GPUs
H Jia, Y Zhang, G Long, J Xu, S Yan, Y Li
Euro-Par 2012 Parallel Processing: 18th International Conference, Euro-Par …, 2012
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction
S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu, B Wu, X Li, S Yan, Y Liang
Proceedings of the 49th Annual International Symposium on Computer …, 2022
Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs
C Li, Y Yang, H Dai, S Yan, F Mueller, H Zhou
2014 IEEE International Symposium on Performance Analysis of Systems and …, 2014
Diesel: A dataset-based distributed storage and caching system for large-scale deep learning training
L Wang, S Ye, B Yang, Y Lu, H Zhang, S Yan, Q Luo
Proceedings of the 49th International Conference on Parallel Processing, 1-11, 2020
Gradientflow: Optimizing network performance for large-scale distributed dnn training
P Sun, Y Wen, R Han, W Feng, S Yan
IEEE Transactions on Big Data 8 (2), 495-507, 2019
Parallelization and performance optimization on face detection algorithm with OpenCL: A case study
W Wang, Y Zhang, S Yan, Y Zhang, H Jia
Tsinghua Science and Technology 17 (3), 287-295, 2012
Enabling efficient fast convolution algorithms on GPUs via MegaKernels
L Jia, Y Liang, X Li, L Lu, S Yan
IEEE Transactions on Computers 69 (7), 986-997, 2020
Timed dataflow: Reducing communication overhead for distributed machine learning systems
P Sun, Y Wen, TNB Duong, S Yan
2016 IEEE 22nd International Conference on Parallel and Distributed Systems …, 2016
Elan: Towards generic and efficient elastic training for deep learning
L Xie, J Zhai, B Wu, Y Wang, X Zhang, P Sun, S Yan
2020 IEEE 40th International Conference on Distributed Computing Systems …, 2020
A cross-platform SpMV framework on many-core architectures
Y Zhang, S Li, S Yan, H Zhou
ACM Transactions on Architecture and Code Optimization (TACO) 13 (4), 1-25, 2016
文章 1–20