Microarchitecture of a high radix router

J Kim, WJ Dally, S Scott, D Abts - ACM SIGARCH Computer Architecture …, 2008 - dl.acm.org

Evolving technology and increasing pin-bandwidth motivate the use of high-radix routers to
reduce the diameter, latency, and cost of interconnection networks. High-radix networks …

被引用次数：1122 相关文章所有 24 个版本

[PDF] aboutdc.ru

Energy proportional datacenter networks

D Abts, MR Marty, PM Wells, P Klausler… - Proceedings of the 37th …, 2010 - dl.acm.org

Numerous studies have shown that datacenter computers rarely operate at full utilization,
leading to a number of proposals for creating servers that are energy proportional with …

被引用次数：632 相关文章所有 25 个版本

[PDF] researchgate.net

Flattened butterfly: a cost-efficient topology for high-radix networks

J Kim, WJ Dally, D Abts - Proceedings of the 34th annual international …, 2007 - dl.acm.org

Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the
degree or radix of interconnection networksand their routers. This paper introduces the …

被引用次数：658 相关文章所有 21 个版本

[PDF] atwiki.jp

Flattened butterfly topology for on-chip networks

J Kim, J Balfour, W Dally - 40th Annual IEEE/ACM International …, 2007 - ieeexplore.ieee.org

With the trend towards increasing number of cores in chip multiprocessors, the on-chip
interconnect that connects the cores needs to scale efficiently. In this work, we propose the …

被引用次数：611 相关文章所有 25 个版本

[PDF] northwestern.edu

Firefly: Illuminating future network-on-chip with nanophotonics

Y Pan, P Kumar, J Kim, G Memik, Y Zhang… - Proceedings of the 36th …, 2009 - dl.acm.org

Future many-core processors will require high-performance yet energy-efficient on-chip
networks to provide a communication substrate for the increasing number of cores. Recent …

被引用次数：538 相关文章所有 22 个版本

[PDF] utah.edu

HyperX: topology, routing, and packaging of efficient large-scale networks

JH Ahn, N Binkert, A Davis, M McLaren… - Proceedings of the …, 2009 - dl.acm.org

In the push to achieve exascale performance, systems will grow to over 100,000 sockets, as
growing cores-per-socket and improved single-core performance provide only part of the …

被引用次数：379 相关文章所有 10 个版本

[PDF] berkeley.edu

Express virtual channels: Towards the ideal interconnection fabric

A Kumar, LS Peh, P Kundu, NK Jha - ACM SIGARCH Computer …, 2007 - dl.acm.org

Due to wire delay scalability and bandwidth limitations inherent in shared buses and
dedicated links, packet-switched on-chip interconnection networks are fast emerging as the …

被引用次数：538 相关文章所有 14 个版本

[PDF] researchgate.net

Think fast: A tensor streaming processor (TSP) for accelerating deep learning workloads

D Abts, J Ross, J Sparling… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-
sliced microarchitecture with memory units interleaved with vector and matrix deep learning …

被引用次数：106 相关文章所有 10 个版本

[PDF] arxiv.org

Deep learning training in facebook data centers: Design of scale-up and scale-out systems

M Naumov, J Kim, D Mudigere, S Sridharan… - arXiv preprint arXiv …, 2020 - arxiv.org

Large-scale training is important to ensure high performance and accuracy of machine-
learning models. At Facebook we use many different models, including computer vision …

被引用次数：92 相关文章所有 3 个版本

[PDF] researchgate.net

A novel dimensionally-decomposed router for on-chip communication in 3D architectures

J Kim, C Nicopoulos, D Park, R Das, Y Xie… - Proceedings of the 34th …, 2007 - dl.acm.org

Much like multi-storey buildings in densely packed metropolises, three-dimensional (3D)
chip structures are envisioned as a viable solution to skyrocketing transistor densities and …

被引用次数：364 相关文章所有 20 个版本