Cuckoo directory: A scalable directory for many-core systems

M Ferdman, P Lotfi-Kamran, K Balet… - 2011 IEEE 17th …, 2011 - ieeexplore.ieee.org
Growing core counts have highlighted the need for scalable on-chip coherence
mechanisms. The increase in the number of on-chip cores exposes the energy and area …

Complexity-effective multicore coherence

A Ros, S Kaxiras - Proceedings of the 21st international conference on …, 2012 - dl.acm.org
Much of the complexity and overhead (directory, state bits, invalidations) of a typical
directory coherence implementation stems from the effort to make it" invisible" even to the …

In-network cache coherence

N Eisley, LS Peh, L Shang - 2006 39th Annual IEEE/ACM …, 2006 - ieeexplore.ieee.org
With the trend towards increasing number of processor cores in future chip architectures,
scalable directory-based protocols for maintaining cache coherence will be needed …

System and method for simplifying cache coherence using multiple write policies

S Kaxiras, A Ros - US Patent 9,274,960, 2016 - Google Patents
Abstract System and methods for cache coherence in a multi-core processing environment
having a local/shared cache hierarchy. The system includes multiple processor cores, a …

Generating efficient data movement code for heterogeneous architectures with distributed-memory

R Dathathri, C Reddy, T Ramashekar… - Proceedings of the …, 2013 - ieeexplore.ieee.org
Programming for parallel architectures that do not have a shared address space is extremely
difficult due to the need for explicit communication between memories of different compute …

Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

ME Acacio, J González, JM García… - SC'02: Proceedings of …, 2002 - ieeexplore.ieee.org
Cache misses for which data must be obtained from a remote cache (cache-to-cache
transfer misses) account for an important fraction of the total miss rate. Unfortunately, cc …

A two-level directory architecture for highly scalable cc-NUMA multiprocessors

ME Acacio, J González, JM Garcia… - IEEE Transactions on …, 2005 - ieeexplore.ieee.org
One important issue the designer of a scalable shared-memory multiprocessor must deal
with is the amount of extra memory required to store the directory information. It is desirable …

Distributed cooperative caching

E Herrero, J González, R Canal - … of the 17th international conference on …, 2008 - dl.acm.org
This paper presents the Distributed Cooperative Caching, a scalable and energy-efficient
scheme to manage chip multiprocessor (CMP) cache resources. The proposed configuration …

Hierarchical cache directory for CMP

SL Guo, HX Wang, YB Xue, CM Li, DS Wang - Journal of Computer …, 2010 - Springer
As more processing cores are integrated into one chip and feature size continues to shrink,
the average access latency for remote nodes using directory-based coherence protocol …

Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution

R Bera, A Ranganathan, J Rakshit… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Load instructions often limit instruction-level parallelism (ILP) in modern processors due to
data and resource dependences they cause. Prior techniques like Load Value Prediction …