A modern primer on processing in memory
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …
design choice goes directly against at least three key trends in computing that cause …
Processing data where it makes sense: Enabling in-memory computation
Today's systems are overwhelmingly designed to move data to computation. This design
choice goes directly against at least three key trends in systems that cause performance …
choice goes directly against at least three key trends in systems that cause performance …
DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
Figaro: Improving system performance via fine-grained in-dram data relocation and caching
Main memory, composed of DRAM, is a performance bottleneck for many applications, due
to the high DRAM access latency. In-DRAM caches work to mitigate this latency by …
to the high DRAM access latency. In-DRAM caches work to mitigate this latency by …
Neither more nor less: Optimizing thread-level parallelism for GPGPUs
General-purpose graphics processing units (GPG-PUs) are at their best in accelerating
computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …
computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …
[PDF][PDF] Research problems and opportunities in memory systems
O Mutlu, L Subramanian - Supercomputing frontiers and …, 2014 - superfri.susu.ru
The memory system is a fundamental performance and energy bottleneck in almost all
computing systems. Recent system design, application, and technology trends that require …
computing systems. Recent system design, application, and technology trends that require …
Mosaic: a GPU memory manager with application-transparent support for multiple page sizes
R Ausavarungnirun, J Landgraf, V Miller… - Proceedings of the 50th …, 2017 - dl.acm.org
Contemporary discrete GPUs support rich memory management features such as virtual
memory and demand paging. These features simplify GPU programming by providing a …
memory and demand paging. These features simplify GPU programming by providing a …
Improving GPGPU resource utilization through alternative thread block scheduling
High performance in GPGPU workloads is obtained by maximizing parallelism and fully
utilizing the available resources. The thousands of threads are assigned to each core in …
utilizing the available resources. The thousands of threads are assigned to each core in …
Divergence-aware warp scheduling
This paper uses hardware thread scheduling to improve the performance and energy
efficiency of divergent applications on GPUs. We propose Divergence-Aware Warp …
efficiency of divergent applications on GPUs. We propose Divergence-Aware Warp …
Coordinated static and dynamic cache bypassing for GPUs
The massive parallel architecture enables graphics processing units (GPUs) to boost
performance for a wide range of applications. Initially, GPUs only employ scratchpad …
performance for a wide range of applications. Initially, GPUs only employ scratchpad …