Clairvoyance: Look-ahead compile-time scheduling
To enhance the performance of memory-bound applications, hardware designs have been
developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the …
developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the …
Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution
J Matějka, B Forsberg, M Sojka, Z Hanzálek… - Proceedings of the 9th …, 2018 - dl.acm.org
Many applications require both high performance and predictable timing. High-performance
can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these …
can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these …
HePREM: A predictable execution model for GPU-based heterogeneous SoCs
B Forsberg, L Benini, A Marongiu - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
The ever-increasing need for computational power in embedded devices has led to the
adoption heterogeneous SoCs combining a general purpose CPU with a data parallel …
adoption heterogeneous SoCs combining a general purpose CPU with a data parallel …
HePREM: Enabling predictable GPU execution on heterogeneous SoC
B Forsberg, L Benini, A Marongiu - 2018 Design, Automation & …, 2018 - ieeexplore.ieee.org
Heterogeneous systems-on-a-chip are increasingly embracing shared memory designs, in
which a single DRAM is used for both the main CPU and an integrated GPU. This …
which a single DRAM is used for both the main CPU and an integrated GPU. This …
Combining PREM compilation and static scheduling for high-performance and predictable MPSoC execution
Many applications require both high performance and predictable timing. High-performance
can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these …
can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these …
SWOOP: Software-hardware co-design for non-speculative, execute-ahead, in-order cores
Increasing demands for energy efficiency constrain emerging hardware. These new
hardware trends challenge the established assumptions in code generation and force us to …
hardware trends challenge the established assumptions in code generation and force us to …
SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems
With increasing core counts and multiple levels of cache memories, scaling multi-threaded
and task-level parallel workloads is continuously becoming a challenge. A key challenge to …
and task-level parallel workloads is continuously becoming a challenge. A key challenge to …
TaDA: Task Decoupling Architecture for the Battery-less Internet of Things
We present TaDA, a system architecture enabling efficient execution of Internet of Things
(IoT) applications across multiple computing units, powered by ambient energy harvesting …
(IoT) applications across multiple computing units, powered by ambient energy harvesting …
Planar: a programmable accelerator for near-memory data rearrangement
Many applications employ irregular and sparse memory accesses that cannot take
advantage of existing cache hierarchies in high performance processors. To solve this …
advantage of existing cache hierarchies in high performance processors. To solve this …
Microarchitecture-aware code generation for deep learning on single-isa heterogeneous multi-core mobile processors
While single-ISA heterogeneous multi-core processors are widely used in mobile
computing, typical code generations optimize the code for a single target core, leaving it less …
computing, typical code generations optimize the code for a single target core, leaving it less …