EIE: Efficient inference engine on compressed deep neural network S Han, X Liu, H Mao, J Pu, A Pedram, MA Horowitz, WJ Dally ACM SIGARCH Computer Architecture News 44 (3), 243-254, 2016 | 3175 | 2016 |
Tetris: Scalable and efficient neural network acceleration with 3d memory M Gao, J Pu, X Yang, M Horowitz, C Kozyrakis Proceedings of the Twenty-Second International Conference on Architectural …, 2017 | 657 | 2017 |
Interstellar: Using halide's scheduling language to analyze dnn accelerators X Yang, M Gao, Q Liu, J Setter, J Pu, A Nayak, S Bell, K Cao, H Ha, ... Proceedings of the Twenty-Fifth International Conference on Architectural …, 2020 | 248 | 2020 |
Tangram: Optimized coarse-grained dataflow for scalable nn accelerators M Gao, X Yang, J Pu, M Horowitz, C Kozyrakis Proceedings of the Twenty-Fourth International Conference on Architectural …, 2019 | 176 | 2019 |
Programming heterogeneous systems from an image processing DSL J Pu, S Bell, X Yang, J Setter, S Richardson, J Ragan-Kelley, M Horowitz ACM Transactions on Architecture and Code Optimization (TACO) 14 (3), 1-25, 2017 | 158 | 2017 |
DNN dataflow choice is overrated X Yang, M Gao, J Pu, A Nayak, Q Liu, SE Bell, JO Setter, K Cao, H Ha, ... arXiv preprint arXiv:1809.04070 6, 5, 2018 | 104 | 2018 |
A systematic approach to blocking convolutional neural networks X Yang, J Pu, BB Rister, N Bhagdikar, S Richardson, S Kvatinsky, ... arXiv preprint arXiv:1606.04209, 2016 | 79 | 2016 |
Deep compression and EIE: Efficient inference engine on compressed deep neural network. S Han, X Liu, H Mao, J Pu, A Pedram, M Horowitz, B Dally Hot Chips Symposium, 1-6, 2016 | 60 | 2016 |
FPU generator for design space exploration S Galal, O Shacham, JS Brunhaver II, J Pu, A Vassiliev, M Horowitz 2013 IEEE 21st Symposium on Computer Arithmetic, 25-34, 2013 | 39 | 2013 |
A 220pJ/pixel/frame CMOS image sensor with partial settling readout architecture S Ji, J Pu, BC Lim, M Horowitz 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), 1-2, 2016 | 14 | 2016 |
FPMax: A 106GFLOPS/W at 217GFLOPS/mm2 single-precision FPU, and a 43.7 GFLOPS/W at 74.6 GFLOPS/mm2 double-precision FPU, in 28nm UTBB FDSOI J Pu, S Galal, X Yang, O Shacham, M Horowitz arXiv preprint arXiv:1606.07852, 2016 | 12 | 2016 |
MDig: Multi-digit recognition using convolutional nerual network on mobile X Yang, J Pu Proc. Yang2015 MDigMR, 1-10, 2015 | 10 | 2015 |
Compiling algorithms for heterogeneous systems S Bell, J Pu, J Hegarty, M Horowitz, M Martonosi Morgan & Claypool Publishers, 2018 | 7 | 2018 |
Performance Investigation on p-Type Si-, Ge-, and Ge–Si Core–Shell Nanowire Schottky Barrier Transistors J Pu, L Sun, RQ Han Japanese Journal of Applied Physics 50 (4S), 04DN10, 2011 | 5 | 2011 |
Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network S Han, X Liu, H Mao, J Pu, A Pedram, MA Horowitz, WJ Dally arXiv preprint arXiv:2306.09552, 2023 | 2 | 2023 |
Programming Heterogeneous Systems From an Image Processing Domain Specific Language J Pu Stanford University, 2017 | 2 | 2017 |
Image Processing with Stencil Pipelines S Bell, J Pu, J Hegarty, M Horowitz Compiling Algorithms for Heterogeneous Systems, 27-31, 2018 | 1 | 2018 |
Interstellar X Yang, M Gao, Q Liu, J Setter, J Pu, A Nayak, S Bell, K Cao, H Ha, ... Proceedings of the Twenty-Fifth International Conference on Architectural …, 2020 | | 2020 |
Darkroom: A Stencil Language for Image Processing S Bell, J Pu, J Hegarty, M Horowitz Compiling Algorithms for Heterogeneous Systems, 33-50, 2018 | | 2018 |
Interfacing with Specialized Hardware S Bell, J Pu, J Hegarty, M Horowitz Compiling Algorithms for Heterogeneous Systems, 69-80, 2018 | | 2018 |