Technology-driven, highly-scalable dragonfly topology

J Kim, WJ Dally, S Scott, D Abts - ACM SIGARCH Computer Architecture …, 2008 - dl.acm.org
Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to
reduce the diameter, latency, and cost of interconnection networks. High-radix networks …

Energy proportional datacenter networks

D Abts, MR Marty, PM Wells, P Klausler… - Proceedings of the 37th …, 2010 - dl.acm.org
Numerous studies have shown that datacenter computers rarely operate at full utilization,
leading to a number of proposals for creating servers that are energy proportional with …

Flattened butterfly: a cost-efficient topology for high-radix networks

J Kim, WJ Dally, D Abts - Proceedings of the 34th annual international …, 2007 - dl.acm.org
Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the
degree or radix of interconnection networksand their routers. This paper introduces the …

Flattened butterfly topology for on-chip networks

J Kim, J Balfour, W Dally - 40th Annual IEEE/ACM International …, 2007 - ieeexplore.ieee.org
With the trend towards increasing number of cores in chip multiprocessors, the on-chip
interconnect that connects the cores needs to scale efficiently. In this work, we propose the …

Firefly: Illuminating future network-on-chip with nanophotonics

Y Pan, P Kumar, J Kim, G Memik, Y Zhang… - Proceedings of the 36th …, 2009 - dl.acm.org
Future many-core processors will require high-performance yet energy-efficient on-chip
networks to provide a communication substrate for the increasing number of cores. Recent …

HyperX: topology, routing, and packaging of efficient large-scale networks

JH Ahn, N Binkert, A Davis, M McLaren… - Proceedings of the …, 2009 - dl.acm.org
In the push to achieve exascale performance, systems will grow to over 100,000 sockets, as
growing cores-per-socket and improved single-core performance provide only part of the …

Express virtual channels: Towards the ideal interconnection fabric

A Kumar, LS Peh, P Kundu, NK Jha - ACM SIGARCH Computer …, 2007 - dl.acm.org
Due to wire delay scalability and bandwidth limitations inherent in shared buses and
dedicated links, packet-switched on-chip interconnection networks are fast emerging as the …

Think fast: A tensor streaming processor (TSP) for accelerating deep learning workloads

D Abts, J Ross, J Sparling… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-
sliced microarchitecture with memory units interleaved with vector and matrix deep learning …

Deep learning training in facebook data centers: Design of scale-up and scale-out systems

M Naumov, J Kim, D Mudigere, S Sridharan… - arXiv preprint arXiv …, 2020 - arxiv.org
Large-scale training is important to ensure high performance and accuracy of machine-
learning models. At Facebook we use many different models, including computer vision …

A novel dimensionally-decomposed router for on-chip communication in 3D architectures

J Kim, C Nicopoulos, D Park, R Das, Y Xie… - Proceedings of the 34th …, 2007 - dl.acm.org
Much like multi-storey buildings in densely packed metropolises, three-dimensional (3D)
chip structures are envisioned as a viable solution to skyrocketing transistor densities and …