Lazy batching: An sla-aware batching system for cloud machine learning inference
In cloud ML inference systems, batching is an essential technique to increase throughput
which helps optimize total-cost-of-ownership. Prior graph batching combines the individual …
which helps optimize total-cost-of-ownership. Prior graph batching combines the individual …
Impact of Distributed-Memory Parallel Processing Approach on Performance Enhancing of Multicomputer-Multicore Systems: A Review
DM Abdulqader, SRM Zeebaree - Qalaai Zanist Journal, 2021 - journal.lfu.edu.krd
Distributed memory is a term used in computer science to describe a multiprocessor
computer system in which each processor has its own private memory. Computational jobs …
computer system in which each processor has its own private memory. Computational jobs …
G10: Enabling an efficient unified gpu memory and storage architecture with smart tensor migrations
To break the GPU memory wall for scaling deep learning workloads, a variety of architecture
and system techniques have been proposed recently. Their typical approaches include …
and system techniques have been proposed recently. Their typical approaches include …
In-depth analyses of unified virtual memory system for GPU accelerated computing
The abstraction of a shared memory space over separate CPU and GPU memory domains
has eased the burden of portability for many HPC codebases. However, users pay for the …
has eased the burden of portability for many HPC codebases. However, users pay for the …
Traversing large graphs on GPUs with unified memory
Due to the limited capacity of GPU memory, the majority of prior work on graph applications
on GPUs has been restricted to graphs of modest sizes that fit in memory. Recent hardware …
on GPUs has been restricted to graphs of modest sizes that fit in memory. Recent hardware …
{MGG}: Accelerating graph neural networks with {Fine-Grained}{Intra-Kernel}{Communication-Computation} pipelining on {Multi-GPU} platforms
The increasing size of input graphs for graph neural networks (GNNs) highlights the demand
for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the …
for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the …
EMOGI: Efficient memory-access for out-of-memory graph-traversal in GPUs
Modern analytics and recommendation systems are increasingly based on graph data that
capture the relations between entities being analyzed. Practical graphs come in huge sizes …
capture the relations between entities being analyzed. Practical graphs come in huge sizes …
Grus: Toward unified-memory-efficient high-performance graph processing on gpu
Today's GPU graph processing frameworks face scalability and efficiency issues as the
graph size exceeds GPU-dedicated memory limit. Although recent GPUs can over-subscribe …
graph size exceeds GPU-dedicated memory limit. Although recent GPUs can over-subscribe …
Harnessing integrated cpu-gpu system memory for hpc: a first look into grace hopper
Memory management across discrete CPU and GPU physical memory is traditionally
achieved through explicit GPU allocations and data copy or unified virtual memory. The …
achieved through explicit GPU allocations and data copy or unified virtual memory. The …
DeepUM: Tensor migration and prefetching in unified memory
Deep neural networks (DNNs) are continuing to get wider and deeper. As a result, it requires
a tremendous amount of GPU memory and computing power. In this paper, we propose a …
a tremendous amount of GPU memory and computing power. In this paper, we propose a …