Lazy batching: An sla-aware batching system for cloud machine learning inference

Y Choi, Y Kim, M Rhu - 2021 IEEE International Symposium on …, 2021 - ieeexplore.ieee.org
In cloud ML inference systems, batching is an essential technique to increase throughput
which helps optimize total-cost-of-ownership. Prior graph batching combines the individual …

Impact of Distributed-Memory Parallel Processing Approach on Performance Enhancing of Multicomputer-Multicore Systems: A Review

DM Abdulqader, SRM Zeebaree - Qalaai Zanist Journal, 2021 - journal.lfu.edu.krd
Distributed memory is a term used in computer science to describe a multiprocessor
computer system in which each processor has its own private memory. Computational jobs …

G10: Enabling an efficient unified gpu memory and storage architecture with smart tensor migrations

H Zhang, Y Zhou, Y Xue, Y Liu, J Huang - … of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
To break the GPU memory wall for scaling deep learning workloads, a variety of architecture
and system techniques have been proposed recently. Their typical approaches include …

In-depth analyses of unified virtual memory system for GPU accelerated computing

T Allen, R Ge - Proceedings of the International Conference for High …, 2021 - dl.acm.org
The abstraction of a shared memory space over separate CPU and GPU memory domains
has eased the burden of portability for many HPC codebases. However, users pay for the …

Traversing large graphs on GPUs with unified memory

P Gera, H Kim, P Sao, H Kim, D Bader - Proceedings of the VLDB …, 2020 - dl.acm.org
Due to the limited capacity of GPU memory, the majority of prior work on graph applications
on GPUs has been restricted to graphs of modest sizes that fit in memory. Recent hardware …

{MGG}: Accelerating graph neural networks with {Fine-Grained}{Intra-Kernel}{Communication-Computation} pipelining on {Multi-GPU} platforms

Y Wang, B Feng, Z Wang, T Geng, K Barker… - … USENIX Symposium on …, 2023 - usenix.org
The increasing size of input graphs for graph neural networks (GNNs) highlights the demand
for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the …

EMOGI: Efficient memory-access for out-of-memory graph-traversal in GPUs

SW Min, VS Mailthody, Z Qureshi, J Xiong… - arXiv preprint arXiv …, 2020 - arxiv.org
Modern analytics and recommendation systems are increasingly based on graph data that
capture the relations between entities being analyzed. Practical graphs come in huge sizes …

Grus: Toward unified-memory-efficient high-performance graph processing on gpu

P Wang, J Wang, C Li, J Wang, H Zhu… - ACM Transactions on …, 2021 - dl.acm.org
Today's GPU graph processing frameworks face scalability and efficiency issues as the
graph size exceeds GPU-dedicated memory limit. Although recent GPUs can over-subscribe …

Harnessing integrated cpu-gpu system memory for hpc: a first look into grace hopper

G Schieffer, J Wahlgren, J Ren, J Faj… - Proceedings of the 53rd …, 2024 - dl.acm.org
Memory management across discrete CPU and GPU physical memory is traditionally
achieved through explicit GPU allocations and data copy or unified virtual memory. The …

DeepUM: Tensor migration and prefetching in unified memory

J Jung, J Kim, J Lee - Proceedings of the 28th ACM International …, 2023 - dl.acm.org
Deep neural networks (DNNs) are continuing to get wider and deeper. As a result, it requires
a tremendous amount of GPU memory and computing power. In this paper, we propose a …