Wavepro: clock-less wave-propagated pipeline compiler for low-power and high-throughput computation
2020 Design, Automation & Test in Europe Conference & Exhibition …, 2020•ieeexplore.ieee.org
Clock-less Wave-Propagated Pipelining is a long-known approach to achieve high-
throughput without the over-head of costly sampling registers. However, due to many design
challenges, which have only increased with technology scaling, this approach has never
been widely accepted and has generally been limited to small and very specific
demonstrations. This paper addresses this barrier by presenting WavePro, a generic and
scalable algorithm, capable of skew balancing any combinatorial logic netlist for the …
throughput without the over-head of costly sampling registers. However, due to many design
challenges, which have only increased with technology scaling, this approach has never
been widely accepted and has generally been limited to small and very specific
demonstrations. This paper addresses this barrier by presenting WavePro, a generic and
scalable algorithm, capable of skew balancing any combinatorial logic netlist for the …
Clock-less Wave-Propagated Pipelining is a long-known approach to achieve high-throughput without the over-head of costly sampling registers. However, due to many design challenges, which have only increased with technology scaling, this approach has never been widely accepted and has generally been limited to small and very specific demonstrations. This paper addresses this barrier by presenting WavePro, a generic and scalable algorithm, capable of skew balancing any combinatorial logic netlist for the application of wave-pipelining. The algorithm was implemented in the WavePro Compiler automation utility, which interfaces with industry delays extraction and standard timing analysis tools to produce a sign-off quality result. The utility is demonstrated upon a dot-product accelerator in a 65 nm CMOS technology, using a vendor-provided standard cell library and commercial timing analysis tools. By reducing the worst-case output skew by over 70%, the test case example was able to achieve equivalent throughput of an 8-staged sequentially pipelined implementation with power savings of almost 3×.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果