POPS: Coherence protocol optimization for both private and shared data

A Ros, S Kaxiras - Proceedings of the 21st international conference on …, 2012 - dl.acm.org

Much of the complexity and overhead (directory, state bits, invalidations) of a typical
directory coherence implementation stems from the effort to make it" invisible" even to the …

被引用次数：166 相关文章所有 19 个版本

[PDF] googleapis.com

System and method for simplifying cache coherence using multiple write policies

S Kaxiras, A Ros - US Patent 9,274,960, 2016 - Google Patents

Abstract System and methods for cache coherence in a multi-core processing environment
having a local/shared cache hierarchy. The system includes multiple processor cores, a …

被引用次数：107 相关文章所有 4 个版本

[PDF] acm.org

Coherence domain restriction on large scale systems

Y Fu, TM Nguyen, D Wentzlaff - … of the 48th International Symposium on …, 2015 - dl.acm.org

Designing massive scale cache coherence systems has been an elusive goal. Whether it be
on large-scale GPUs, future thousand-core chips, or across million-core warehouse scale …

被引用次数：60 相关文章所有 9 个版本

[HTML] diva-portal.org

Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies

A Ros, M Davari, S Kaxiras - 2015 IEEE 21st International …, 2015 - ieeexplore.ieee.org

Hierarchical clustered cache designs are becoming an appealing alternative for multicores.
Grouping cores and their caches in clusters reduces network congestion by localizing traffic …

被引用次数：46 相关文章所有 12 个版本

[PDF] academia.edu

[PDF][PDF] Universitat politecnica de Valencia

A García - Ingeniería del agua, 2014 - academia.edu

Embedded devices are becoming more and more present everywhere. Moreover mobile
devices are becoming also more computationally powerful. These embedded architectures …

被引用次数：34 相关文章所有 4 个版本

[PDF] ntnu.no

Selective replication in memory-side GPU caches

X Zhao, M Jahre, L Eeckhout - 2020 53rd Annual IEEE/ACM …, 2020 - ieeexplore.ieee.org

Data-intensive applications put immense strain on the memory systems of Graphics
Processing Units (GPUs). To cater to this need, GPU memory systems distribute requests …

被引用次数：12 相关文章所有 8 个版本

[PDF] upv.es

Temporal-aware mechanism to detect private data in chip multiprocessors

A Ros, B Cuesta, ME Gómez… - … on Parallel Processing, 2013 - ieeexplore.ieee.org

Most of the data referenced by sequential and parallel applications running in current chip
multiprocessors are referenced by only one thread and can be considered as private data. A …

被引用次数：35 相关文章所有 11 个版本

[PDF] mit.edu

Nexus: A new approach to replication in distributed shared caches

PA Tsai, N Beckmann… - 2017 26th International …, 2017 - ieeexplore.ieee.org

Last-level caches are increasingly distributed, consisting of many small banks. To perform
well, most accesses must be served by banks near requesting cores. An attractive approach …

被引用次数：19 相关文章所有 11 个版本

[PDF] diva-portal.org

A dual-consistency cache coherence protocol

A Ros, A Jimborean - 2015 IEEE International Parallel and …, 2015 - ieeexplore.ieee.org

Weak memory consistency models can maximize system performance by enabling
hardware and compiler optimizations, but increase programming complexity since they do …

被引用次数：27 相关文章所有 12 个版本

[PDF] upv.es

Efficient tlb-based detection of private pages in chip multiprocessors

A Esteve, A Ros, ME Gómez, A Robles… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org

Most of the data referenced by sequential and parallel applications running in current chip
multiprocessors are referenced by a single thread, ie, private. Recent proposals leverage …

被引用次数：26 相关文章所有 4 个版本