Technology-driven, highly-scalable dragonfly topology
Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to
reduce the diameter, latency, and cost of interconnection networks. High-radix networks …
reduce the diameter, latency, and cost of interconnection networks. High-radix networks …
Energy proportional datacenter networks
Numerous studies have shown that datacenter computers rarely operate at full utilization,
leading to a number of proposals for creating servers that are energy proportional with …
leading to a number of proposals for creating servers that are energy proportional with …
Flattened butterfly: a cost-efficient topology for high-radix networks
Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the
degree or radix of interconnection networksand their routers. This paper introduces the …
degree or radix of interconnection networksand their routers. This paper introduces the …
Flattened butterfly topology for on-chip networks
With the trend towards increasing number of cores in chip multiprocessors, the on-chip
interconnect that connects the cores needs to scale efficiently. In this work, we propose the …
interconnect that connects the cores needs to scale efficiently. In this work, we propose the …
Firefly: Illuminating future network-on-chip with nanophotonics
Future many-core processors will require high-performance yet energy-efficient on-chip
networks to provide a communication substrate for the increasing number of cores. Recent …
networks to provide a communication substrate for the increasing number of cores. Recent …
HyperX: topology, routing, and packaging of efficient large-scale networks
In the push to achieve exascale performance, systems will grow to over 100,000 sockets, as
growing cores-per-socket and improved single-core performance provide only part of the …
growing cores-per-socket and improved single-core performance provide only part of the …
Express virtual channels: Towards the ideal interconnection fabric
Due to wire delay scalability and bandwidth limitations inherent in shared buses and
dedicated links, packet-switched on-chip interconnection networks are fast emerging as the …
dedicated links, packet-switched on-chip interconnection networks are fast emerging as the …
Think fast: A tensor streaming processor (TSP) for accelerating deep learning workloads
D Abts, J Ross, J Sparling… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-
sliced microarchitecture with memory units interleaved with vector and matrix deep learning …
sliced microarchitecture with memory units interleaved with vector and matrix deep learning …
Deep learning training in facebook data centers: Design of scale-up and scale-out systems
Large-scale training is important to ensure high performance and accuracy of machine-
learning models. At Facebook we use many different models, including computer vision …
learning models. At Facebook we use many different models, including computer vision …
A novel dimensionally-decomposed router for on-chip communication in 3D architectures
Much like multi-storey buildings in densely packed metropolises, three-dimensional (3D)
chip structures are envisioned as a viable solution to skyrocketing transistor densities and …
chip structures are envisioned as a viable solution to skyrocketing transistor densities and …