System, method, and computer program product for improving memory systems

MS Smith - US Patent 9,432,298, 2016 - Google Patents
H01L25/18—Assemblies consisting of a plurality of individual semiconductor or other solid
state devices; Multistep manufacturing processes thereof the devices being of types …

Transparent offloading and mapping (TOM) enabling programmer-transparent near-data processing in GPU systems

K Hsieh, E Ebrahimi, G Kim, N Chatterjee… - ACM SIGARCH …, 2016 - dl.acm.org
Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-
chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …

[图书][B] Memory systems: cache, DRAM, disk

B Jacob, D Wang, S Ng - 2010 - books.google.com
Is your memory hierarchy stopping your microprocessor from performing at the high level it
should be? Memory Systems: Cache, DRAM, Disk shows you how to resolve this problem …

A case for exploiting subarray-level parallelism (SALP) in DRAM

Y Kim, V Seshadri, D Lee, J Liu, O Mutlu - ACM SIGARCH Computer …, 2012 - dl.acm.org
Modern DRAMs have multiple banks to serve multiple memory requests in parallel.
However, when two requests go to the same bank, they have to be served serially …

Self-optimizing memory controllers: A reinforcement learning approach

E Ipek, O Mutlu, JF Martínez, R Caruana - ACM SIGARCH Computer …, 2008 - dl.acm.org
Efficiently utilizing off-chip DRAM bandwidth is a critical issuein designing cost-effective,
high-performance chip multiprocessors (CMPs). Conventional memory controllers deliver …

Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems

O Mutlu, T Moscibroda - ACM SIGARCH Computer Architecture News, 2008 - dl.acm.org
In a chip-multiprocessor (CMP) system, the DRAM system isshared among cores. In a
shared DRAM system, requests from athread can not only delay requests from other threads …

ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers

Y Kim, D Han, O Mutlu… - HPCA-16 2010 The …, 2010 - ieeexplore.ieee.org
Modern chip multiprocessor (CMP) systems employ multiple memory controllers to control
access to main memory. The scheduling algorithm employed by these memory controllers …

Data reorganization in memory using 3D-stacked DRAM

B Akin, F Franchetti, JC Hoe - ACM SIGARCH Computer Architecture …, 2015 - dl.acm.org
In this paper we focus on common data reorganization operations such as shuffle,
pack/unpack, swap, transpose, and layout transformations. Although these operations …

Stall-time fair memory access scheduling for chip multiprocessors

O Mutlu, T Moscibroda - 40th Annual IEEE/ACM International …, 2007 - ieeexplore.ieee.org
DRAM memory is a major resource shared among cores in a chip multiprocessor (CMP)
system. Memory requests from different threads can interfere with each other. Existing …

ChargeCache: Reducing DRAM latency by exploiting row access locality

H Hassan, G Pekhimenko, N Vijaykumar… - … Symposium on High …, 2016 - ieeexplore.ieee.org
DRAM latency continues to be a critical bottleneck for system performance. In this work, we
develop a low-cost mechanism, called Charge Cache, that enables faster access to recently …