In-datacenter performance analysis of a tensor processing unit NP Jouppi, C Young, N Patil, D Patterson, G Agrawal, R Bajwa, S Bates, ... Proceedings of the 44th Annual International Symposium on Computer …, 2017 | 5357 | 2017 |
Memory consistency and event ordering in scalable shared-memory multiprocessors K Gharachorloo, D Lenoski, J Laudon, P Gibbons, A Gupta, J Hennessy ACM SIGARCH Computer Architecture News 18 (2SI), 15-26, 1990 | 1820 | 1990 |
The stanford dash multiprocessor D Lenoski, J Laudon, K Gharachorloo, WD Weber, A Gupta, J Hennessy, ... Computer 25 (3), 63-79, 1992 | 1499 | 1992 |
The SGI Origin: a ccNUMA highly scalable server J Laudon, D Lenoski ACM SIGARCH Computer Architecture News 25 (2), 241-251, 1997 | 1241 | 1997 |
The directory-based cache coherence protocol for the DASH multiprocessor D Lenoski, J Laudon, K Gharachorloo, A Gupta, J Hennessy ACM SIGARCH Computer Architecture News 18 (2SI), 148-159, 1990 | 999 | 1990 |
A graph placement methodology for fast chip design A Mirhoseini, A Goldie, M Yazgan, JW Jiang, E Songhori, S Wang, YJ Lee, ... Nature 594 (7862), 207-212, 2021 | 524 | 2021 |
Fair queuing memory systems KJ Nesbit, N Aggarwal, J Laudon, JE Smith 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture …, 2006 | 469 | 2006 |
Ten Lessons From Three Generations Shaped Google’s TPUv4i: Industrial Product NP Jouppi, DH Yoon, M Ashcraft, M Gottscho, TB Jablin, G Kurian, ... 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture …, 2021 | 317 | 2021 |
A domain-specific supercomputer for training deep neural networks NP Jouppi, DH Yoon, G Kurian, S Li, N Patil, J Laudon, C Young, ... Communications of the ACM 63 (7), 67-78, 2020 | 282 | 2020 |
The DASH prototype: Logic overhead and performance D Lenoski, J Laudon, T Joe, D Nakahira, L Stevens, A Gupta, J Hennessy IEEE Transactions on Parallel and Distributed Systems 4 (1), 41-61, 1993 | 278 | 1993 |
The DASH prototype: Implementation and performance D Lenoski, J Laudon, T Joe, D Nakahira, L Stevens, A Gupta, J Hennessy ACM SIGARCH Computer Architecture News 20 (2), 92-103, 1992 | 274 | 1992 |
Chip Placement with Deep Reinforcement Learning A Mirhoseini, A Goldie, M Yazgan, J Jiang, E Songhori, S Wang, YJ Lee, ... arXiv preprint arXiv:2004.10746, 2020 | 236 | 2020 |
Virtual private caches KJ Nesbit, J Laudon, JE Smith Proceedings of the 34th annual international symposium on Computer …, 2007 | 229 | 2007 |
Interleaving: A multithreading technique targeting multiprocessors and workstations J Laudon, A Gupta, M Horowitz ACM SIGPLAN Notices 29 (11), 308-318, 1994 | 169 | 1994 |
Apparatus and method for profiling system events in a fine grain multi-threaded multi-core processor N Kosche, JP Laudon, AR Talcott, S Patel, F Sajjadian US Patent 8,762,951, 2014 | 166 | 2014 |
Maximizing CMP throughput with mediocre cores JD Davis, J Laudon, K Olukotun 14th International Conference on Parallel Architectures and Compilation …, 2005 | 165 | 2005 |
Chip multiprocessor architecture: techniques to improve throughput and latency OA Olukotun, L Hammond, JP Laudon Morgan & Claypool Publishers, 2007 | 160 | 2007 |
Mixture-of-experts with expert choice routing Y Zhou, T Lei, H Liu, N Du, Y Huang, V Zhao, AM Dai, QV Le, J Laudon Advances in Neural Information Processing Systems 35, 7103-7114, 2022 | 158 | 2022 |
The ZS-1 central processor JE Smith, GE Dermer, BD Vanderwarn, SD Klinger, CM Rozewski, ... ACM SIGARCH Computer Architecture News 15 (5), 199-204, 1987 | 151 | 1987 |
High memory capacity DIMM with data and state memory JP Laudon, DE Lenoski, J Manton, ME Anderson US Patent 6,049,476, 2000 | 137 | 2000 |