Crono: A benchmark suite for multithreaded graph algorithms executing on futuristic multicores
Algorithms operating on a graph setting are known to be highly irregular and unstructured.
This leads to workload imbalance and data locality challenge when these algorithms are …
This leads to workload imbalance and data locality challenge when these algorithms are …
An abacus turn model for time/space-efficient reconfigurable routing
Applications' traffic tends to be bursty and the location of hot-spot nodes moves as time goes
by. This will significantly aggregate the blocking problem of wormhole-routed Network-on …
by. This will significantly aggregate the blocking problem of wormhole-routed Network-on …
Self-aware computing in the Angstrom processor
Addressing the challenges of extreme scale computing requires holistic design of new
programming models and systems that support those models. This paper discusses the …
programming models and systems that support those models. This paper discusses the …
Leveraging latency-insensitivity to ease multiple FPGA design
Traditionally, hardware designs partitioned across multiple FPGAs have had low
performance due to the inefficiency of maintaining cycle-by-cycle timing among discrete …
performance due to the inefficiency of maintaining cycle-by-cycle timing among discrete …
DARSIM: a parallel cycle-level NoC simulator
We present DARSIM, a parallel, highly configurable, cycle-level network-on-chip simulator
based on an ingress-queued wormhole router architecture. The parallel simulation engine …
based on an ingress-queued wormhole router architecture. The parallel simulation engine …
Hornet: A cycle-level multicore simulator
We present hornet, a parallel, highly configurable, cycle-level multicore simulator based on
an ingress-queued wormhole router network-on-chip (NoC) architecture. The parallel …
an ingress-queued wormhole router network-on-chip (NoC) architecture. The parallel …
Bandwidth-optimal all-to-all exchanges in fat tree networks
The personalized all-to-all collective exchange is one of the most challenging
communication patterns in HPC applications in terms of performance and scalability. In the …
communication patterns in HPC applications in terms of performance and scalability. In the …
Scalable, accurate multicore simulation in the 1000-core era
We present HORNET, a parallel, highly configurable, cycle-level multicore simulator based
on an ingress-queued worm-hole router NoC architecture. The parallel simulation engine …
on an ingress-queued worm-hole router NoC architecture. The parallel simulation engine …
Scalable interconnects for reconfigurable spatial architectures
Recent years have seen the increased adoption of Coarse-Grained Reconfigurable
Architectures (CGRAs) as flexible, energy-efficient compute accelerators. Obtaining …
Architectures (CGRAs) as flexible, energy-efficient compute accelerators. Obtaining …
Heracles: a tool for fast RTL-based design space exploration of multicore processors
This paper presents Heracles, an open-source, functional, parameterized, synthesizable
multicore system toolkit. Such a multi/many-core design platform is a powerful and versatile …
multicore system toolkit. Such a multi/many-core design platform is a powerful and versatile …