Application-aware deadlock-free oblivious routing

M Ahmad, F Hijaz, Q Shi, O Khan - 2015 IEEE International …, 2015 - ieeexplore.ieee.org

Algorithms operating on a graph setting are known to be highly irregular and unstructured.
This leads to workload imbalance and data locality challenge when these algorithms are …

被引用次数：140 相关文章所有 7 个版本

[PDF] researchgate.net

An abacus turn model for time/space-efficient reconfigurable routing

B Fu, Y Han, J Ma, H Li, X Li - Proceedings of the 38th annual …, 2011 - dl.acm.org

Applications' traffic tends to be bursty and the location of hot-spot nodes moves as time goes
by. This will significantly aggregate the blocking problem of wormhole-routed Network-on …

被引用次数：123 相关文章所有 7 个版本

[PDF] mit.edu

Self-aware computing in the Angstrom processor

H Hoffmann, J Holt, G Kurian, E Lau, M Maggio… - Proceedings of the 49th …, 2012 - dl.acm.org

Addressing the challenges of extreme scale computing requires holistic design of new
programming models and systems that support those models. This paper discusses the …

被引用次数：109 相关文章所有 12 个版本

[PDF] mit.edu

Leveraging latency-insensitivity to ease multiple FPGA design

KE Fleming, M Adler, M Pellauer, A Parashar… - Proceedings of the …, 2012 - dl.acm.org

Traditionally, hardware designs partitioned across multiple FPGAs have had low
performance due to the inefficiency of maintaining cycle-by-cycle timing among discrete …

被引用次数：77 相关文章所有 8 个版本

[PDF] mit.edu

DARSIM: a parallel cycle-level NoC simulator

M Lis, KS Shim, MH Cho, P Ren, O Khan, S Devadas - 2010 - dspace.mit.edu

We present DARSIM, a parallel, highly configurable, cycle-level network-on-chip simulator
based on an ingress-queued wormhole router architecture. The parallel simulation engine …

被引用次数：85 相关文章所有 10 个版本

[PDF] academia.edu

Hornet: A cycle-level multicore simulator

P Ren, M Lis, MH Cho, KS Shim… - … on Computer-Aided …, 2012 - ieeexplore.ieee.org

We present hornet, a parallel, highly configurable, cycle-level multicore simulator based on
an ingress-queued wormhole router network-on-chip (NoC) architecture. The parallel …

被引用次数：81 相关文章所有 12 个版本

[PDF] unixer.de

Bandwidth-optimal all-to-all exchanges in fat tree networks

B Prisacari, G Rodriguez, C Minkenberg… - Proceedings of the 27th …, 2013 - dl.acm.org

The personalized all-to-all collective exchange is one of the most challenging
communication patterns in HPC applications in terms of performance and scalability. In the …

被引用次数：66 相关文章所有 22 个版本

[PDF] mit.edu

Scalable, accurate multicore simulation in the 1000-core era

M Lis, P Ren, MH Cho, KS Shim… - (IEEE ISPASS) IEEE …, 2011 - ieeexplore.ieee.org

We present HORNET, a parallel, highly configurable, cycle-level multicore simulator based
on an ingress-queued worm-hole router NoC architecture. The parallel simulation engine …

被引用次数：87 相关文章所有 15 个版本

[PDF] github.io

Scalable interconnects for reconfigurable spatial architectures

Y Zhang, A Rucker, M Vilim, R Prabhakar… - Proceedings of the 46th …, 2019 - dl.acm.org

Recent years have seen the increased adoption of Coarse-Grained Reconfigurable
Architectures (CGRAs) as flexible, energy-efficient compute accelerators. Obtaining …

被引用次数：31 相关文章所有 8 个版本

[PDF] mit.edu

Heracles: a tool for fast RTL-based design space exploration of multicore processors

MA Kinsy, M Pellauer, S Devadas - Proceedings of the ACM/SIGDA …, 2013 - dl.acm.org

This paper presents Heracles, an open-source, functional, parameterized, synthesizable
multicore system toolkit. Such a multi/many-core design platform is a powerful and versatile …

被引用次数：60 相关文章所有 13 个版本