Customizing clos network-on-chip for neural networks
IEEE Transactions on Computers, 2017•ieeexplore.ieee.org
Large-scale neural network accelerators are often implemented as a many-core chip and
rely on a network-on-chip to manage the huge amount of inter-neuron traffic. The baseline
and different variations of the well-known mesh and tree topologies are the most popular
topologies in prior many-core implementations of neural networks. However, the grid-like
mesh and hierarchical tree topologies suffer from high diameter and low bisection
bandwidth, respectively. In this paper, we present ClosNN, a customized Clos topology for …
rely on a network-on-chip to manage the huge amount of inter-neuron traffic. The baseline
and different variations of the well-known mesh and tree topologies are the most popular
topologies in prior many-core implementations of neural networks. However, the grid-like
mesh and hierarchical tree topologies suffer from high diameter and low bisection
bandwidth, respectively. In this paper, we present ClosNN, a customized Clos topology for …
Large-scale neural network accelerators are often implemented as a many-core chip and rely on a network-on-chip to manage the huge amount of inter-neuron traffic. The baseline and different variations of the well-known mesh and tree topologies are the most popular topologies in prior many-core implementations of neural networks. However, the grid-like mesh and hierarchical tree topologies suffer from high diameter and low bisection bandwidth, respectively. In this paper, we present ClosNN, a customized Clos topology for Neural Networks. The inherent capability of Clos to support multicast and broadcast traffic in a simple and efficient way, as well as its adaptable bisection bandwidth, is the major motivation behind proposing a customized version of this topology as the communication infrastructure of large-scale neural network implementations. We compare ClosNN with some state-of-the-art NoC topologies adopted in recent neural network hardware accelerators and show that it offers lower average message hop count and higher throughput, which directly translates to faster neural information processing.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果