C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs
Recently, significant accuracy improvement has been achieved for acoustic recognition
systems by increasing the model size of Long Short-Term Memory (LSTM) networks …
systems by increasing the model size of Long Short-Term Memory (LSTM) networks …
An asynchronous dataflow-driven execution model for distributed accelerator computing
While domain-specific HPC software packages continue to thrive and are vital to many
scientific communities, a general purpose high-productivity GPU cluster programming model …
scientific communities, a general purpose high-productivity GPU cluster programming model …
Adaptive optimization for OpenCL programs on embedded heterogeneous systems
Heterogeneous multi-core architectures consisting of CPUs and GPUs are commonplace in
today's embedded systems. These architectures offer potential for energy efficient computing …
today's embedded systems. These architectures offer potential for energy efficient computing …
Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL
Heterogeneous systems are the core architecture of most of the high-performance
computing nodes, due to their excellent performance and energy efficiency. However, a key …
computing nodes, due to their excellent performance and energy efficiency. However, a key …
Troodon: A machine-learning based load-balancing application scheduler for CPU–GPU system
Heterogeneous computing machines consisting of a CPU and one or more GPUs are
increasingly being used today because of their higher performance-cost ratio and lower …
increasingly being used today because of their higher performance-cost ratio and lower …
Efficient and fair multi-programming in GPUs via effective bandwidth management
Managing the thread-level parallelism (TLP) of GPGPU applications by limiting it to a certain
degree is known to be effective in improving the overall performance. However, we find that …
degree is known to be effective in improving the overall performance. However, we find that …
[HTML][HTML] Sigmoid: An auto-tuned load balancing algorithm for heterogeneous systems
A challenge that heterogeneous system programmers face is leveraging the performance of
all the devices that integrate the system. This paper presents Sigmoid, a new load balancing …
all the devices that integrate the system. This paper presents Sigmoid, a new load balancing …
Optimal Model Partitioning with Low-Overhead Profiling on the PIM-based Platform for Deep Learning Inference
Recently Processing-in-Memory (PIM) has become a promising solution to achieve energy-
efficient computation in data-intensive applications by placing computation near or inside …
efficient computation in data-intensive applications by placing computation near or inside …
E-OSched: a load balancing scheduler for heterogeneous multicores
The contemporary multicore era has adhered to the heterogeneous computing devices as
one of the proficient platforms to execute compute-intensive applications. These …
one of the proficient platforms to execute compute-intensive applications. These …
Heterogeneous energy-aware load balancing for industry 4.0 and IoT environments
U Ahmed, JCW Lin, G Srivastava - ACM Transactions on Management …, 2022 - dl.acm.org
With the improvement of global infrastructure, Cyber-Physical Systems (CPS) have become
an important component of Industry 4.0. Both the application as well as the machine work …
an important component of Industry 4.0. Both the application as well as the machine work …