Crono: A benchmark suite for multithreaded graph algorithms executing on futuristic multicores

M Ahmad, F Hijaz, Q Shi, O Khan - 2015 IEEE International …, 2015 - ieeexplore.ieee.org
Algorithms operating on a graph setting are known to be highly irregular and unstructured.
This leads to workload imbalance and data locality challenge when these algorithms are …

An abacus turn model for time/space-efficient reconfigurable routing

B Fu, Y Han, J Ma, H Li, X Li - Proceedings of the 38th annual …, 2011 - dl.acm.org
Applications' traffic tends to be bursty and the location of hot-spot nodes moves as time goes
by. This will significantly aggregate the blocking problem of wormhole-routed Network-on …

Self-aware computing in the Angstrom processor

H Hoffmann, J Holt, G Kurian, E Lau, M Maggio… - Proceedings of the 49th …, 2012 - dl.acm.org
Addressing the challenges of extreme scale computing requires holistic design of new
programming models and systems that support those models. This paper discusses the …

Leveraging latency-insensitivity to ease multiple FPGA design

KE Fleming, M Adler, M Pellauer, A Parashar… - Proceedings of the …, 2012 - dl.acm.org
Traditionally, hardware designs partitioned across multiple FPGAs have had low
performance due to the inefficiency of maintaining cycle-by-cycle timing among discrete …

DARSIM: a parallel cycle-level NoC simulator

M Lis, KS Shim, MH Cho, P Ren, O Khan, S Devadas - 2010 - dspace.mit.edu
We present DARSIM, a parallel, highly configurable, cycle-level network-on-chip simulator
based on an ingress-queued wormhole router architecture. The parallel simulation engine …

Hornet: A cycle-level multicore simulator

P Ren, M Lis, MH Cho, KS Shim… - … on Computer-Aided …, 2012 - ieeexplore.ieee.org
We present hornet, a parallel, highly configurable, cycle-level multicore simulator based on
an ingress-queued wormhole router network-on-chip (NoC) architecture. The parallel …

Bandwidth-optimal all-to-all exchanges in fat tree networks

B Prisacari, G Rodriguez, C Minkenberg… - Proceedings of the 27th …, 2013 - dl.acm.org
The personalized all-to-all collective exchange is one of the most challenging
communication patterns in HPC applications in terms of performance and scalability. In the …

Scalable, accurate multicore simulation in the 1000-core era

M Lis, P Ren, MH Cho, KS Shim… - (IEEE ISPASS) IEEE …, 2011 - ieeexplore.ieee.org
We present HORNET, a parallel, highly configurable, cycle-level multicore simulator based
on an ingress-queued worm-hole router NoC architecture. The parallel simulation engine …

Scalable interconnects for reconfigurable spatial architectures

Y Zhang, A Rucker, M Vilim, R Prabhakar… - Proceedings of the 46th …, 2019 - dl.acm.org
Recent years have seen the increased adoption of Coarse-Grained Reconfigurable
Architectures (CGRAs) as flexible, energy-efficient compute accelerators. Obtaining …

Heracles: a tool for fast RTL-based design space exploration of multicore processors

MA Kinsy, M Pellauer, S Devadas - Proceedings of the ACM/SIGDA …, 2013 - dl.acm.org
This paper presents Heracles, an open-source, functional, parameterized, synthesizable
multicore system toolkit. Such a multi/many-core design platform is a powerful and versatile …