C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs

S Wang, Z Li, C Ding, B Yuan, Q Qiu, Y Wang… - Proceedings of the …, 2018 - dl.acm.org
Recently, significant accuracy improvement has been achieved for acoustic recognition
systems by increasing the model size of Long Short-Term Memory (LSTM) networks …

An asynchronous dataflow-driven execution model for distributed accelerator computing

P Salzmann, F Knorr, P Thoman… - 2023 IEEE/ACM …, 2023 - ieeexplore.ieee.org
While domain-specific HPC software packages continue to thrive and are vital to many
scientific communities, a general purpose high-productivity GPU cluster programming model …

Adaptive optimization for OpenCL programs on embedded heterogeneous systems

B Taylor, VS Marco, Z Wang - ACM SIGPLAN Notices, 2017 - dl.acm.org
Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in
today's embedded systems. These architectures offer potential for energy efficient computing …

Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL

MA Dávila Guzmán, R Nozal, R Gran Tejero… - The Journal of …, 2019 - Springer
Heterogeneous systems are the core architecture of most of the high-performance
computing nodes, due to their excellent performance and energy efficiency. However, a key …

Troodon: A machine-learning based load-balancing application scheduler for CPU–GPU system

YN Khalid, M Aleem, U Ahmed, MA Islam… - Journal of Parallel and …, 2019 - Elsevier
Heterogeneous computing machines consisting of a CPU and one or more GPUs are
increasingly being used today because of their higher performance-cost ratio and lower …

Efficient and fair multi-programming in GPUs via effective bandwidth management

H Wang, F Luo, M Ibrahim, O Kayiran… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Managing the thread-level parallelism (TLP) of GPGPU applications by limiting it to a certain
degree is known to be effective in improving the overall performance. However, we find that …

[HTML][HTML] Sigmoid: An auto-tuned load balancing algorithm for heterogeneous systems

B Pérez, E Stafford, JL Bosque, R Beivide - Journal of Parallel and …, 2021 - Elsevier
A challenge that heterogeneous system programmers face is leveraging the performance of
all the devices that integrate the system. This paper presents Sigmoid, a new load balancing …

Optimal Model Partitioning with Low-Overhead Profiling on the PIM-based Platform for Deep Learning Inference

SY Kim, J Lee, Y Paik, CH Kim, WJ Lee… - ACM Transactions on …, 2024 - dl.acm.org
Recently Processing-in-Memory (PIM) has become a promising solution to achieve energy-
efficient computation in data-intensive applications by placing computation near or inside …

E-OSched: a load balancing scheduler for heterogeneous multicores

YN Khalid, M Aleem, R Prodan, MA Iqbal… - The Journal of …, 2018 - Springer
The contemporary multicore era has adhered to the heterogeneous computing devices as
one of the proficient platforms to execute compute-intensive applications. These …

Heterogeneous energy-aware load balancing for industry 4.0 and IoT environments

U Ahmed, JCW Lin, G Srivastava - ACM Transactions on Management …, 2022 - dl.acm.org
With the improvement of global infrastructure, Cyber-Physical Systems (CPS) have become
an important component of Industry 4.0. Both the application as well as the machine work …