A survey of CPU-GPU heterogeneous computing techniques

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2015 - dl.acm.org
As both CPUs and GPUs become employed in a wide range of applications, it has been
acknowledged that both of these Processing Units (PUs) have their unique features and …

[PDF][PDF] Embracing diversity in the Barrelfish manycore operating system

A Schüpbach, S Peter, A Baumann… - Proceedings of the …, 2008 - asq.gribex.net
We discuss diversity and heterogeneity in manycore computer systems, and identify three
distinct types of diversity, all of which present challenges to operating system designers and …

An efficient, model-based CPU-GPU heterogeneous FFT library

Y Ogata, T Endo, N Maruyama… - 2008 IEEE international …, 2008 - ieeexplore.ieee.org
General-Purpose computing on Graphics Processing Units (GPGPU) is becoming popular in
HPC because of its high peak performance. However, in spite of the potential performance …

Energy-efficient acceleration of deep neural networks on realtime-constrained embedded edge devices

B Kim, S Lee, AR Trivedi, WJ Song - IEEE Access, 2020 - ieeexplore.ieee.org
This paper presents a hardware management technique that enables energy-efficient
acceleration of deep neural networks (DNNs) on realtime-constrained embedded edge …

Optimization of sparse matrix-vector multiplication with variant CSR on GPUs

X Feng, H Jin, R Zheng, K Hu, J Zeng… - 2011 IEEE 17th …, 2011 - ieeexplore.ieee.org
Sparse Matrix-Vector multiplication (SpMV) is one of the most significant yet challenging
issues in computational science area. It is a memory-bound application whose performance …

Processing data streams with hard real-time constraints on heterogeneous systems

U Verner, A Schuster, M Silberstein - Proceedings of the international …, 2011 - dl.acm.org
Data stream processing applications such as stock exchange data analysis, VoIP streaming,
and sensor data processing pose two conflicting challenges: short per-stream latency--to …

Matrix multiplication on high-density multi-GPU architectures: theoretical and experimental investigations

P Zhang, Y Gao - … Computing: 30th International Conference, ISC High …, 2015 - Springer
Matrix multiplication (MM) is one of the core problems in the high performance computing
domain and its efficiency impacts performances of almost all matrix problems. The high …

Optimization of quasi-diagonal matrix–vector multiplication on GPU

W Yang, K Li, Y Liu, L Shi… - The international journal …, 2014 - journals.sagepub.com
Sparse matrix–vector multiplication (SpMV) is of singular importance in sparse linear
algebra, which is an important issue in scientific computing and engineering practice. Much …

An efficient GPU implementation of the revised simplex method

J Bieling, P Peschlow, P Martini - 2010 IEEE International …, 2010 - ieeexplore.ieee.org
The computational power provided by the massive parallelism of modern graphics
processing units (GPUs) has moved increasingly into focus over the past few years. In …

Flexi-BOPI: Flexible Granularity Pipeline Inference with Bayesian Optimization for Deep Learning Models on HMPSoC

Z Wang, P Yang, B Zhang, L Hu, W Lv, C Lin… - Information Sciences, 2024 - Elsevier
To achieve high-throughput deep learning (DL) model inference on heterogeneous
multiprocessor systems-on-chip (HMPSoC) platforms, the use of pipelining for the …