Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs X Wei, CH Yu, P Zhang, Y Chen, Y Wang, H Hu, Y Liang, J Cong Proceedings of the 54th Annual Design Automation Conference 2017, 1-6, 2017 | 459 | 2017 |
Polyhedral-based data reuse optimization for configurable computing LN Pouchet, P Zhang, P Sadayappan, J Cong Proceedings of the ACM/SIGDA international symposium on Field programmable …, 2013 | 225 | 2013 |
Memory partitioning for multidimensional arrays in high-level synthesis Y Wang, P Li, P Zhang, C Zhang, J Cong Proceedings of the 50th Annual Design Automation Conference, 1-8, 2013 | 118 | 2013 |
Optimizing memory hierarchy allocation with loop transformations for high-level synthesis J Cong, P Zhang, Y Zou Proceedings of the 49th annual design automation conference, 1233-1238, 2012 | 84 | 2012 |
TGPA: Tile-grained pipeline architecture for low latency CNN inference X Wei, Y Liang, X Li, CH Yu, P Zhang, J Cong 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 1-8, 2018 | 78 | 2018 |
Automated accelerator generation and optimization with composable, parallel and pipeline architecture J Cong, P Wei, CH Yu, P Zhang Proceedings of the 55th Annual Design Automation Conference, 1-6, 2018 | 75 | 2018 |
An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers J Cong, P Li, B Xiao, P Zhang Proceedings of the 51st annual design automation conference, 1-6, 2014 | 66 | 2014 |
Systems and methods for systolic array design from a high-level program P Zhang, CH Yu, X Wei, P Pan US Patent 10,838,910, 2020 | 60 | 2020 |
Multiple modes intra-prediction in intra coding P Zhang, D Zhao, S Ma, Y Lu, W Gao 2004 IEEE International Conference on Multimedia and Expo (ICME)(IEEE Cat …, 2004 | 55 | 2004 |
HLScope+: Fast and accurate performance estimation for FPGA HLS Y Choi, P Zhang, P Li, J Cong 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 691-698, 2017 | 51 | 2017 |
Source-to-source optimization for HLS J Cong, M Huang, P Pan, Y Wang, P Zhang FPGAs for Software Programmers, 137-163, 2016 | 51 | 2016 |
Resource-aware throughput optimization for high-level synthesis P Li, P Zhang, LN Pouchet, J Cong Proceedings of the 2015 ACM/SIGDA International Symposium on Field …, 2015 | 49 | 2015 |
S2FA: An accelerator automation framework for heterogeneous computing in datacenters CH Yu, P Wei, M Grossman, P Zhang, V Sarker, J Cong Proceedings of the 55th Annual Design Automation Conference, 1-6, 2018 | 44 | 2018 |
Variable-bin-rate CABAC engine for H. 264/AVC high definition real-time decoding P Zhang, D Xie, W Gao IEEE Transactions on very large scale integration (VLSI) systems 17 (3), 417-426, 2009 | 44 | 2009 |
Combining computation and communication optimizations in system synthesis for streaming applications J Cong, M Huang, P Zhang Proceedings of the 2014 ACM/SIGDA international symposium on Field …, 2014 | 41 | 2014 |
An integrated and automated memory optimization flow for FPGA behavioral synthesis Y Wang, P Zhang, X Cheng, J Cong 17th Asia and South Pacific Design Automation Conference, 257-262, 2012 | 41 | 2012 |
Mode mapping method for H. 264/AVC spatial downscaling transcoding P Zhang, Y Lu, Q Huang, W Gao 2004 International Conference on Image Processing, 2004. ICIP'04. 4, 2781-2784, 2004 | 41 | 2004 |
Memory partitioning and scheduling co-optimization in behavioral synthesis P Li, Y Wang, P Zhang, G Luo, T Wang, J Cong Proceedings of the international conference on computer-aided design, 488-495, 2012 | 39 | 2012 |
Combining module selection and replication for throughput-driven streaming programs J Cong, M Huang, B Liu, P Zhang, Y Zou 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE …, 2012 | 37 | 2012 |
Combined loop transformation and hierarchy allocation for data reuse optimization J Cong, P Zhang, Y Zou 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 185-192, 2011 | 34 | 2011 |