Clairvoyance: Look-ahead compile-time scheduling

KA Tran, TE Carlson, K Koukos… - 2017 IEEE/ACM …, 2017 - ieeexplore.ieee.org
To enhance the performance of memory-bound applications, hardware designs have been
developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the …

Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution

J Matějka, B Forsberg, M Sojka, Z Hanzálek… - Proceedings of the 9th …, 2018 - dl.acm.org
Many applications require both high performance and predictable timing. High-performance
can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these …

HePREM: A predictable execution model for GPU-based heterogeneous SoCs

B Forsberg, L Benini, A Marongiu - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
The ever-increasing need for computational power in embedded devices has led to the
adoption heterogeneous SoCs combining a general purpose CPU with a data parallel …

HePREM: Enabling predictable GPU execution on heterogeneous SoC

B Forsberg, L Benini, A Marongiu - 2018 Design, Automation & …, 2018 - ieeexplore.ieee.org
Heterogeneous systems-on-a-chip are increasingly embracing shared memory designs, in
which a single DRAM is used for both the main CPU and an integrated GPU. This …

Combining PREM compilation and static scheduling for high-performance and predictable MPSoC execution

J Matějka, B Forsberg, M Sojka, P Šůcha, L Benini… - Parallel computing, 2019 - Elsevier
Many applications require both high performance and predictable timing. High-performance
can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these …

SWOOP: Software-hardware co-design for non-speculative, execute-ahead, in-order cores

KA Tran, A Jimborean, TE Carlson, K Koukos… - Proceedings of the 39th …, 2018 - dl.acm.org
Increasing demands for energy efficiency constrain emerging hardware. These new
hardware trends challenge the established assumptions in code generation and force us to …

SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems

Q Wu, A Ekanayake, R Li, J Beard, L John - Proceedings of the 51st …, 2022 - dl.acm.org
With increasing core counts and multiple levels of cache memories, scaling multi-threaded
and task-level parallel workloads is continuously becoming a challenge. A key challenge to …

TaDA: Task Decoupling Architecture for the Battery-less Internet of Things

W Song, S Kaxiras, T Voigt, Y Yao… - Proceedings of the 22nd …, 2024 - dl.acm.org
We present TaDA, a system architecture enabling efficient execution of Internet of Things
(IoT) applications across multiple computing units, powered by ambient energy harvesting …

Planar: a programmable accelerator for near-memory data rearrangement

A Barredo, A Armejach, J Beard, M Moreto - Proceedings of the ACM …, 2021 - dl.acm.org
Many applications employ irregular and sparse memory accesses that cannot take
advantage of existing cache hierarchies in high performance processors. To solve this …

Microarchitecture-aware code generation for deep learning on single-isa heterogeneous multi-core mobile processors

J Park, Y Kwon, Y Park, D Jeon - IEEE Access, 2019 - ieeexplore.ieee.org
While single-ISA heterogeneous multi-core processors are widely used in mobile
computing, typical code generations optimize the code for a single target core, leaving it less …