Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpo...

KA Tran, TE Carlson, K Koukos… - 2017 IEEE/ACM …, 2017 - ieeexplore.ieee.org

To enhance the performance of memory-bound applications, hardware designs have been
developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the …

被引用次数：35 相关文章所有 11 个版本

[PDF] cvut.cz

Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution

J Matějka, B Forsberg, M Sojka, Z Hanzálek… - Proceedings of the 9th …, 2018 - dl.acm.org

Many applications require both high performance and predictable timing. High-performance
can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these …

被引用次数：30 相关文章所有 9 个版本

[PDF] unimore.it

HePREM: A predictable execution model for GPU-based heterogeneous SoCs

B Forsberg, L Benini, A Marongiu - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

The ever-increasing need for computational power in embedded devices has led to the
adoption heterogeneous SoCs combining a general purpose CPU with a data parallel …

被引用次数：18 相关文章所有 5 个版本

[PDF] ethz.ch

HePREM: Enabling predictable GPU execution on heterogeneous SoC

B Forsberg, L Benini, A Marongiu - 2018 Design, Automation & …, 2018 - ieeexplore.ieee.org

Heterogeneous systems-on-a-chip are increasingly embracing shared memory designs, in
which a single DRAM is used for both the main CPU and an integrated GPU. This …

被引用次数：26 相关文章所有 7 个版本

[PDF] cvut.cz

Combining PREM compilation and static scheduling for high-performance and predictable MPSoC execution

J Matějka, B Forsberg, M Sojka, P Šůcha, L Benini… - Parallel computing, 2019 - Elsevier

Many applications require both high performance and predictable timing. High-performance
can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these …

被引用次数：19 相关文章所有 9 个版本

[PDF] ntnu.no

SWOOP: Software-hardware co-design for non-speculative, execute-ahead, in-order cores

KA Tran, A Jimborean, TE Carlson, K Koukos… - Proceedings of the 39th …, 2018 - dl.acm.org

Increasing demands for energy efficiency constrain emerging hardware. These new
hardware trends challenge the established assumptions in code generation and force us to …

被引用次数：18 相关文章所有 10 个版本

[PDF] acm.org

SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems

Q Wu, A Ekanayake, R Li, J Beard, L John - Proceedings of the 51st …, 2022 - dl.acm.org

With increasing core counts and multiple levels of cache memories, scaling multi-threaded
and task-level parallel workloads is continuously becoming a challenge. A key challenge to …

被引用次数：3 相关文章所有 2 个版本

[PDF] acm.org

TaDA: Task Decoupling Architecture for the Battery-less Internet of Things

W Song, S Kaxiras, T Voigt, Y Yao… - Proceedings of the 22nd …, 2024 - dl.acm.org

We present TaDA, a system architecture enabling efficient execution of Internet of Things
(IoT) applications across multiple computing units, powered by ambient energy harvesting …

[PDF] upc.edu

Planar: a programmable accelerator for near-memory data rearrangement

A Barredo, A Armejach, J Beard, M Moreto - Proceedings of the ACM …, 2021 - dl.acm.org

Many applications employ irregular and sparse memory accesses that cannot take
advantage of existing cache hierarchies in high performance processors. To solve this …

被引用次数：4 相关文章所有 2 个版本

[PDF] ieee.org

Microarchitecture-aware code generation for deep learning on single-isa heterogeneous multi-core mobile processors

J Park, Y Kwon, Y Park, D Jeon - IEEE Access, 2019 - ieeexplore.ieee.org

While single-ISA heterogeneous multi-core processors are widely used in mobile
computing, typical code generations optimize the code for a single target core, leaving it less …

被引用次数：7 相关文章所有 7 个版本