Map-reduce for machine learning on multicore CT Chu, S Kim, YA Lin, YY Yu, G Bradski, K Olukotun, A Ng Advances in neural information processing systems 19, 2006 | 1890 | 2006 |
Niagara: A 32-way multithreaded sparc processor P Kongetira, K Aingaran, K Olukotun IEEE micro 25 (2), 21-29, 2005 | 1328 | 2005 |
STAMP: Stanford transactional applications for multi-processing CC Minh, JW Chung, C Kozyrakis, K Olukotun 2008 IEEE International Symposium on Workload Characterization, 35-46, 2008 | 1319 | 2008 |
The case for a single-chip multiprocessor K Olukotun, BA Nayfeh, L Hammond, K Wilson, K Chang ACM Sigplan Notices 31 (9), 2-11, 1996 | 1187 | 1996 |
Transactional memory coherence and consistency L Hammond, V Wong, M Chen, BD Carlstrom, JD Davis, B Hertzberg, ... ACM SIGARCH Computer Architecture News 32 (2), 102, 2004 | 1043 | 2004 |
A single-chip multiprocessor BA Nayfeh, K Olukotun Computer 30 (9), 79-85, 1997 | 672 | 1997 |
The stanford hydra cmp L Hammond, BA Hubbert, M Siu, MK Prabhu, M Chen, K Olukolun IEEE micro 20 (2), 71-84, 2000 | 551 | 2000 |
Data speculation support for a chip multiprocessor L Hammond, M Willey, K Olukotun ACM SIGOPS Operating Systems Review 32 (5), 58-69, 1998 | 546 | 1998 |
Accelerating CUDA graph algorithms at maximum warp S Hong, SK Kim, T Oguntebi, K Olukotun Acm Sigplan Notices 46 (8), 267-276, 2011 | 506 | 2011 |
An effective hybrid transactional memory system with strong isolation guarantees CC Minh, M Trautmann, JW Chung, A McDonald, N Bronson, J Casper, ... Proceedings of the 34th annual international symposium on Computer …, 2007 | 458 | 2007 |
Efficient parallel graph exploration on multi-core CPU and GPU S Hong, T Oguntebi, K Olukotun 2011 International Conference on Parallel Architectures and Compilation …, 2011 | 411 | 2011 |
Green-Marl: a DSL for easy and efficient graph analysis S Hong, H Chafi, E Sedlar, K Olukotun Proceedings of the seventeenth international conference on Architectural …, 2012 | 408 | 2012 |
The Future of Microprocessors: Chip multiprocessors’ promise of huge performance gains is now a reality. K Olukotun, L Hammond Queue 3 (7), 26-29, 2005 | 393 | 2005 |
REMARC: Reconfigurable multimedia array coprocessor T Miyamori, K Olukotun IEICE Transactions on information and systems 82 (2), 389-397, 1999 | 383 | 1999 |
Dawnbench: An end-to-end deep learning benchmark and competition C Coleman, D Narayanan, D Kang, T Zhao, J Zhang, L Nardi, P Bailis, ... Training 100 (101), 102, 2017 | 382 | 2017 |
Liszt: a domain specific language for building portable mesh-based PDE solvers Z DeVito, N Joubert, F Palacios, S Oakley, M Medina, M Barrientos, ... Proceedings of 2011 international conference for high performance computing …, 2011 | 329 | 2011 |
Plasticine: A reconfigurable architecture for parallel paterns R Prabhakar, Y Zhang, D Koeplinger, M Feldman, T Zhao, S Hadjis, ... ACM SIGARCH Computer Architecture News 45 (2), 389-402, 2017 | 313 | 2017 |
Emptyheaded: A relational engine for graph processing CR Aberger, A Lamb, S Tu, A Nötzli, K Olukotun, C Ré ACM Transactions on Database Systems (TODS) 42 (4), 1-44, 2017 | 309 | 2017 |
A practical concurrent binary search tree NG Bronson, J Casper, H Chafi, K Olukotun ACM Sigplan Notices 45 (5), 257-268, 2010 | 305 | 2010 |
OptiML: an implicitly parallel domain-specific language for machine learning A Sujeeth, HJ Lee, K Brown, T Rompf, H Chafi, M Wu, A Atreya, ... Proceedings of the 28th International Conference on Machine Learning (ICML …, 2011 | 304 | 2011 |