Batch-aware unified memory management in GPUs for irregular workloads

Y Choi, Y Kim, M Rhu - 2021 IEEE International Symposium on …, 2021 - ieeexplore.ieee.org

In cloud ML inference systems, batching is an essential technique to increase throughput
which helps optimize total-cost-of-ownership. Prior graph batching combines the individual …

被引用次数：66 相关文章所有 6 个版本

[PDF] edu.krd

Impact of Distributed-Memory Parallel Processing Approach on Performance Enhancing of Multicomputer-Multicore Systems: A Review

DM Abdulqader, SRM Zeebaree - Qalaai Zanist Journal, 2021 - journal.lfu.edu.krd

Distributed memory is a term used in computer science to describe a multiprocessor
computer system in which each processor has its own private memory. Computational jobs …

被引用次数：15 相关文章所有 4 个版本

[PDF] arxiv.org

G10: Enabling an efficient unified gpu memory and storage architecture with smart tensor migrations

H Zhang, Y Zhou, Y Xue, Y Liu, J Huang - … of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org

To break the GPU memory wall for scaling deep learning workloads, a variety of architecture
and system techniques have been proposed recently. Their typical approaches include …

被引用次数：16 相关文章所有 9 个版本

[PDF] acm.org

In-depth analyses of unified virtual memory system for GPU accelerated computing

T Allen, R Ge - Proceedings of the International Conference for High …, 2021 - dl.acm.org

The abstraction of a shared memory space over separate CPU and GPU memory domains
has eased the burden of portability for many HPC codebases. However, users pay for the …

被引用次数：39 相关文章所有 5 个版本

[PDF] osti.gov

Traversing large graphs on GPUs with unified memory

P Gera, H Kim, P Sao, H Kim, D Bader - Proceedings of the VLDB …, 2020 - dl.acm.org

Due to the limited capacity of GPU memory, the majority of prior work on graph applications
on GPUs has been restricted to graphs of modest sizes that fit in memory. Recent hardware …

被引用次数：58 相关文章所有 11 个版本

[PDF] usenix.org

{MGG}: Accelerating graph neural networks with {Fine-Grained}{Intra-Kernel}{Communication-Computation} pipelining on {Multi-GPU} platforms

Y Wang, B Feng, Z Wang, T Geng, K Barker… - … USENIX Symposium on …, 2023 - usenix.org

The increasing size of input graphs for graph neural networks (GNNs) highlights the demand
for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

EMOGI: Efficient memory-access for out-of-memory graph-traversal in GPUs

SW Min, VS Mailthody, Z Qureshi, J Xiong… - arXiv preprint arXiv …, 2020 - arxiv.org

Modern analytics and recommendation systems are increasingly based on graph data that
capture the relations between entities being analyzed. Practical graphs come in huge sizes …

被引用次数：60 相关文章所有 11 个版本

[PDF] acm.org Full View

Grus: Toward unified-memory-efficient high-performance graph processing on gpu

P Wang, J Wang, C Li, J Wang, H Zhu… - ACM Transactions on …, 2021 - dl.acm.org

Today's GPU graph processing frameworks face scalability and efficiency issues as the
graph size exceeds GPU-dedicated memory limit. Although recent GPUs can over-subscribe …

被引用次数：40 相关文章

[PDF] acm.org

Harnessing integrated cpu-gpu system memory for hpc: a first look into grace hopper

G Schieffer, J Wahlgren, J Ren, J Faj… - Proceedings of the 53rd …, 2024 - dl.acm.org

Memory management across discrete CPU and GPU physical memory is traditionally
achieved through explicit GPU allocations and data copy or unified virtual memory. The …

被引用次数：5 相关文章所有 4 个版本

DeepUM: Tensor migration and prefetching in unified memory

J Jung, J Kim, J Lee - Proceedings of the 28th ACM International …, 2023 - dl.acm.org

Deep neural networks (DNNs) are continuing to get wider and deeper. As a result, it requires
a tremendous amount of GPU memory and computing power. In this paper, we propose a …

被引用次数：22 相关文章