- 学术资源搜索

Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org

In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

被引用次数：51 相关文章所有 3 个版本

[PDF] ieee.org

Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system

J Gómez-Luna, I El Hajj, I Fernandez… - IEEE …, 2022 - ieeexplore.ieee.org

Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …

被引用次数：90 相关文章所有 3 个版本

[PDF] arxiv.org

CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication

W Liu, B Vinter - Proceedings of the 29th ACM on International …, 2015 - dl.acm.org

Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous
applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage …

被引用次数：353 相关文章所有 5 个版本

[PDF] cam.ac.uk

Scalable GPU graph traversal

D Merrill, M Garland, A Grimshaw - ACM Sigplan Notices, 2012 - dl.acm.org

Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-
level graph analysis algorithms. It is also representative of a class of parallel computations …

被引用次数：720 相关文章所有 17 个版本

[PDF] ethz.ch

Benchmarking a new paradigm: An experimental analysis of a real processing-in-memory architecture

J Gómez-Luna, IE Hajj, I Fernandez… - arXiv preprint arXiv …, 2021 - arxiv.org

Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …

被引用次数：83 相关文章所有 3 个版本

[PDF] arxiv.org

GPU-accelerated compression and visualization of large-scale vessel trajectories in maritime IoT industries

Y Huang, Y Li, Z Zhang, RW Liu - IEEE Internet of Things …, 2020 - ieeexplore.ieee.org

The automatic identification system (AIS), an automatic vessel-tracking system, has been
widely adopted to perform intelligent traffic management and collision avoidance services in …

被引用次数：110 相关文章所有 5 个版本

[PDF] escholarship.org

Fast tridiagonal solvers on the GPU

Y Zhang, J Cohen, JD Owens - ACM Sigplan Notices, 2010 - dl.acm.org

We study the performance of three parallel algorithms and their hybrid variants for solving
tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) …

被引用次数：342 相关文章所有 14 个版本

High performance and scalable radix sorting: A case study of implementing dynamic parallelism for GPU computing

D Merrill, A Grimshaw - Parallel Processing Letters, 2011 - World Scientific

The need to rank and order data is pervasive, and many algorithms are fundamentally
dependent upon sorting and partitioning operations. Prior to this work, GPU stream …

被引用次数：250 相关文章所有 7 个版本

[PDF] ncsu.edu

yaSpMV: Yet another SpMV framework on GPUs

S Yan, C Li, Y Zhang, H Zhou - Acm Sigplan Notices, 2014 - dl.acm.org

SpMV is a key linear algebra algorithm and has been widely used in many important
application domains. As a result, numerous attempts have been made to optimize SpMV on …

被引用次数：186 相关文章所有 7 个版本

[PDF] escholarship.org

[图书][B] Efficient parallel scan algorithms for many-core gpus

S Sengupta, MJ Harris, M Garland, JD Owens - 2011 - api.taylorfrancis.com

We have witnessed a phenomenal increase in computational resources for graphics
processors units (GPU) over the last few years. The highest performing graphics processors …

被引用次数：258 相关文章所有 12 个版本