Complexity-effective multicore coherence

A Ros, S Kaxiras - Proceedings of the 21st international conference on …, 2012 - dl.acm.org
Much of the complexity and overhead (directory, state bits, invalidations) of a typical
directory coherence implementation stems from the effort to make it" invisible" even to the …

System and method for simplifying cache coherence using multiple write policies

S Kaxiras, A Ros - US Patent 9,274,960, 2016 - Google Patents
Abstract System and methods for cache coherence in a multi-core processing environment
having a local/shared cache hierarchy. The system includes multiple processor cores, a …

Coherence domain restriction on large scale systems

Y Fu, TM Nguyen, D Wentzlaff - … of the 48th International Symposium on …, 2015 - dl.acm.org
Designing massive scale cache coherence systems has been an elusive goal. Whether it be
on large-scale GPUs, future thousand-core chips, or across million-core warehouse scale …

Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies

A Ros, M Davari, S Kaxiras - 2015 IEEE 21st International …, 2015 - ieeexplore.ieee.org
Hierarchical clustered cache designs are becoming an appealing alternative for multicores.
Grouping cores and their caches in clusters reduces network congestion by localizing traffic …

[PDF][PDF] Universitat politecnica de Valencia

A García - Ingeniería del agua, 2014 - academia.edu
Embedded devices are becoming more and more present everywhere. Moreover mobile
devices are becoming also more computationally powerful. These embedded architectures …

Selective replication in memory-side GPU caches

X Zhao, M Jahre, L Eeckhout - 2020 53rd Annual IEEE/ACM …, 2020 - ieeexplore.ieee.org
Data-intensive applications put immense strain on the memory systems of Graphics
Processing Units (GPUs). To cater to this need, GPU memory systems distribute requests …

Temporal-aware mechanism to detect private data in chip multiprocessors

A Ros, B Cuesta, ME Gómez… - … on Parallel Processing, 2013 - ieeexplore.ieee.org
Most of the data referenced by sequential and parallel applications running in current chip
multiprocessors are referenced by only one thread and can be considered as private data. A …

Nexus: A new approach to replication in distributed shared caches

PA Tsai, N Beckmann… - 2017 26th International …, 2017 - ieeexplore.ieee.org
Last-level caches are increasingly distributed, consisting of many small banks. To perform
well, most accesses must be served by banks near requesting cores. An attractive approach …

A dual-consistency cache coherence protocol

A Ros, A Jimborean - 2015 IEEE International Parallel and …, 2015 - ieeexplore.ieee.org
Weak memory consistency models can maximize system performance by enabling
hardware and compiler optimizations, but increase programming complexity since they do …

Efficient tlb-based detection of private pages in chip multiprocessors

A Esteve, A Ros, ME Gómez, A Robles… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
Most of the data referenced by sequential and parallel applications running in current chip
multiprocessors are referenced by a single thread, ie, private. Recent proposals leverage …