Complexity-effective multicore coherence
Much of the complexity and overhead (directory, state bits, invalidations) of a typical
directory coherence implementation stems from the effort to make it" invisible" even to the …
directory coherence implementation stems from the effort to make it" invisible" even to the …
System and method for simplifying cache coherence using multiple write policies
Abstract System and methods for cache coherence in a multi-core processing environment
having a local/shared cache hierarchy. The system includes multiple processor cores, a …
having a local/shared cache hierarchy. The system includes multiple processor cores, a …
Coherence domain restriction on large scale systems
Designing massive scale cache coherence systems has been an elusive goal. Whether it be
on large-scale GPUs, future thousand-core chips, or across million-core warehouse scale …
on large-scale GPUs, future thousand-core chips, or across million-core warehouse scale …
Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies
Hierarchical clustered cache designs are becoming an appealing alternative for multicores.
Grouping cores and their caches in clusters reduces network congestion by localizing traffic …
Grouping cores and their caches in clusters reduces network congestion by localizing traffic …
[PDF][PDF] Universitat politecnica de Valencia
A García - Ingeniería del agua, 2014 - academia.edu
Embedded devices are becoming more and more present everywhere. Moreover mobile
devices are becoming also more computationally powerful. These embedded architectures …
devices are becoming also more computationally powerful. These embedded architectures …
Selective replication in memory-side GPU caches
Data-intensive applications put immense strain on the memory systems of Graphics
Processing Units (GPUs). To cater to this need, GPU memory systems distribute requests …
Processing Units (GPUs). To cater to this need, GPU memory systems distribute requests …
Temporal-aware mechanism to detect private data in chip multiprocessors
Most of the data referenced by sequential and parallel applications running in current chip
multiprocessors are referenced by only one thread and can be considered as private data. A …
multiprocessors are referenced by only one thread and can be considered as private data. A …
Nexus: A new approach to replication in distributed shared caches
PA Tsai, N Beckmann… - 2017 26th International …, 2017 - ieeexplore.ieee.org
Last-level caches are increasingly distributed, consisting of many small banks. To perform
well, most accesses must be served by banks near requesting cores. An attractive approach …
well, most accesses must be served by banks near requesting cores. An attractive approach …
A dual-consistency cache coherence protocol
A Ros, A Jimborean - 2015 IEEE International Parallel and …, 2015 - ieeexplore.ieee.org
Weak memory consistency models can maximize system performance by enabling
hardware and compiler optimizations, but increase programming complexity since they do …
hardware and compiler optimizations, but increase programming complexity since they do …
Efficient tlb-based detection of private pages in chip multiprocessors
Most of the data referenced by sequential and parallel applications running in current chip
multiprocessors are referenced by a single thread, ie, private. Recent proposals leverage …
multiprocessors are referenced by a single thread, ie, private. Recent proposals leverage …