Affinity-based thread and data mapping in shared memory systems
Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …
{MemProf}: A Memory {Profiler} for {NUMA} Multicore Systems
Modern multicore systems are based on a Non-Uniform Memory Access (NUMA) design.
Efficiently exploiting such architectures is notoriously complex for programmers. One of the …
Efficiently exploiting such architectures is notoriously complex for programmers. One of the …
Handling the problems and opportunities posed by multiple on-chip memory controllers
Modern processors such as Tilera's Tile64, Intel's Nehalem, and AMD's Opteron are
migrating memory controllers (MCs) on-chip, while maintaining a large, flat memory address …
migrating memory controllers (MCs) on-chip, while maintaining a large, flat memory address …
[PDF][PDF] NUMA-aware algorithms: the case of data shuffling.
In recent years, a new breed of non-uniform memory access (NUMA) systems has emerged:
multi-socket servers of multicores. This paper makes the case that data management …
multi-socket servers of multicores. This paper makes the case that data management …
Memory system performance in a NUMA multicore multiprocessor
Z Majo, TR Gross - Proceedings of the 4th Annual International …, 2011 - dl.acm.org
Modern multicore processors with an on-chip memory controller form the base for NUMA
(non-uniform memory architecture) multiprocessors. Each processor accesses part of the …
(non-uniform memory architecture) multiprocessors. Each processor accesses part of the …
A tool to analyze the performance of multithreaded programs on NUMA architectures
X Liu, J Mellor-Crummey - ACM Sigplan Notices, 2014 - dl.acm.org
Almost all of today's microprocessors contain memory controllers and directly attach to
memory. Modern multiprocessor systems support non-uniform memory access (NUMA): it is …
memory. Modern multiprocessor systems support non-uniform memory access (NUMA): it is …
A study on communication issues for systems-on-chip
Present days cores composing a system-on-chip might be interconnected by means of both
dedicated channels or shared buses. Nevertheless, future systems will have strong …
dedicated channels or shared buses. Nevertheless, future systems will have strong …
Locality-centric data and threadblock management for massive GPUs
Recent work has shown that building GPUs with hundreds of SMs in a single monolithic chip
will not be practical due to slowing growth in transistor density, low chip yields, and …
will not be practical due to slowing growth in transistor density, low chip yields, and …
Switched real-time ethernet with earliest deadline first scheduling protocols and traffic handling
H Hoang, M Jonsson, U Hagstrom… - Proceedings 16th …, 2002 - ieeexplore.ieee.org
There is a strong interest of using the cheap and simple Ethernet technology for industrial
and embedded systems. This far, however, the lack of real-time services has prevented this …
and embedded systems. This far, however, the lack of real-time services has prevented this …
A data-centric profiler for parallel programs
X Liu, J Mellor-Crummey - … of the International Conference on High …, 2013 - dl.acm.org
It is difficult to manually identify opportunities for enhancing data locality. To address this
problem, we extended the HPCToolkit performance tools to support data-centric profiling of …
problem, we extended the HPCToolkit performance tools to support data-centric profiling of …