Optimization techniques for GPU programming
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …
high-performance computing and they still advance new fields such as IoT, autonomous …
Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …
fundamentally memory-bound. For such workloads, the data movement between main …
CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication
W Liu, B Vinter - Proceedings of the 29th ACM on International …, 2015 - dl.acm.org
Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous
applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage …
applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage …
Scalable GPU graph traversal
D Merrill, M Garland, A Grimshaw - ACM Sigplan Notices, 2012 - dl.acm.org
Breadth-first search (BFS) is a core primitive for graph traversal and a basis for many higher-
level graph analysis algorithms. It is also representative of a class of parallel computations …
level graph analysis algorithms. It is also representative of a class of parallel computations …
Benchmarking a new paradigm: An experimental analysis of a real processing-in-memory architecture
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …
fundamentally memory-bound. For such workloads, the data movement between main …
GPU-accelerated compression and visualization of large-scale vessel trajectories in maritime IoT industries
Y Huang, Y Li, Z Zhang, RW Liu - IEEE Internet of Things …, 2020 - ieeexplore.ieee.org
The automatic identification system (AIS), an automatic vessel-tracking system, has been
widely adopted to perform intelligent traffic management and collision avoidance services in …
widely adopted to perform intelligent traffic management and collision avoidance services in …
Fast tridiagonal solvers on the GPU
We study the performance of three parallel algorithms and their hybrid variants for solving
tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) …
tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) …
High performance and scalable radix sorting: A case study of implementing dynamic parallelism for GPU computing
D Merrill, A Grimshaw - Parallel Processing Letters, 2011 - World Scientific
The need to rank and order data is pervasive, and many algorithms are fundamentally
dependent upon sorting and partitioning operations. Prior to this work, GPU stream …
dependent upon sorting and partitioning operations. Prior to this work, GPU stream …
yaSpMV: Yet another SpMV framework on GPUs
SpMV is a key linear algebra algorithm and has been widely used in many important
application domains. As a result, numerous attempts have been made to optimize SpMV on …
application domains. As a result, numerous attempts have been made to optimize SpMV on …
[图书][B] Efficient parallel scan algorithms for many-core gpus
We have witnessed a phenomenal increase in computational resources for graphics
processors units (GPU) over the last few years. The highest performing graphics processors …
processors units (GPU) over the last few years. The highest performing graphics processors …