Parallel processing of matrix multiplication in a CPU and GPU heterogeneous environment

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2015 - dl.acm.org

As both CPUs and GPUs become employed in a wide range of applications, it has been
acknowledged that both of these Processing Units (PUs) have their unique features and …

被引用次数：699 相关文章所有 7 个版本

[PDF] gribex.net

[PDF][PDF] Embracing diversity in the Barrelfish manycore operating system

A Schüpbach, S Peter, A Baumann… - Proceedings of the …, 2008 - asq.gribex.net

We discuss diversity and heterogeneity in manycore computer systems, and identify three
distinct types of diversity, all of which present challenges to operating system designers and …

被引用次数：164 相关文章所有 17 个版本

An efficient, model-based CPU-GPU heterogeneous FFT library

Y Ogata, T Endo, N Maruyama… - 2008 IEEE international …, 2008 - ieeexplore.ieee.org

General-Purpose computing on Graphics Processing Units (GPGPU) is becoming popular in
HPC because of its high peak performance. However, in spite of the potential performance …

被引用次数：122 相关文章所有 5 个版本

[PDF] ieee.org

Energy-efficient acceleration of deep neural networks on realtime-constrained embedded edge devices

B Kim, S Lee, AR Trivedi, WJ Song - IEEE Access, 2020 - ieeexplore.ieee.org

This paper presents a hardware management technique that enables energy-efficient
acceleration of deep neural networks (DNNs) on realtime-constrained embedded edge …

被引用次数：31 相关文章所有 3 个版本

Optimization of sparse matrix-vector multiplication with variant CSR on GPUs

X Feng, H Jin, R Zheng, K Hu, J Zeng… - 2011 IEEE 17th …, 2011 - ieeexplore.ieee.org

Sparse Matrix-Vector multiplication (SpMV) is one of the most significant yet challenging
issues in computational science area. It is a memory-bound application whose performance …

被引用次数：68 相关文章所有 5 个版本

[PDF] researchgate.net

Processing data streams with hard real-time constraints on heterogeneous systems

U Verner, A Schuster, M Silberstein - Proceedings of the international …, 2011 - dl.acm.org

Data stream processing applications such as stock exchange data analysis, VoIP streaming,
and sensor data processing pose two conflicting challenges: short per-stream latency--to …

被引用次数：68 相关文章所有 3 个版本

Matrix multiplication on high-density multi-GPU architectures: theoretical and experimental investigations

P Zhang, Y Gao - … Computing: 30th International Conference, ISC High …, 2015 - Springer

Matrix multiplication (MM) is one of the core problems in the high performance computing
domain and its efficiency impacts performances of almost all matrix problems. The high …

被引用次数：52 相关文章

[PDF] psu.edu

Optimization of quasi-diagonal matrix–vector multiplication on GPU

W Yang, K Li, Y Liu, L Shi… - The international journal …, 2014 - journals.sagepub.com

Sparse matrix–vector multiplication (SpMV) is of singular importance in sparse linear
algebra, which is an important issue in scientific computing and engineering practice. Much …

被引用次数：42 相关文章所有 6 个版本

An efficient GPU implementation of the revised simplex method

J Bieling, P Peschlow, P Martini - 2010 IEEE International …, 2010 - ieeexplore.ieee.org

The computational power provided by the massive parallelism of modern graphics
processing units (GPUs) has moved increasingly into focus over the past few years. In …

被引用次数：57 相关文章所有 4 个版本

Flexi-BOPI: Flexible Granularity Pipeline Inference with Bayesian Optimization for Deep Learning Models on HMPSoC

Z Wang, P Yang, B Zhang, L Hu, W Lv, C Lin… - Information Sciences, 2024 - Elsevier

To achieve high-throughput deep learning (DL) model inference on heterogeneous
multiprocessor systems-on-chip (HMPSoC) platforms, the use of pipelining for the …