Td-nuca: runtime driven management of nuca caches in task dataflow programming models
In high performance processors, the design of on-chip memory hierarchies is crucial for
performance and energy efficiency. Current processors rely on large shared Non-Uniform …
performance and energy efficiency. Current processors rely on large shared Non-Uniform …
Fine-grain data classification to filter token coherence traffic
Snoop-based cache coherence protocols perform well in small-scale systems by enabling
low latency cache-to-cache data transfers in just two-hop coherence transactions. However …
low latency cache-to-cache data transfers in just two-hop coherence transactions. However …
Novel techniques to improve the performance and the energy of vector architectures
A Barredo Ferreira - 2021 - upcommons.upc.edu
The rate of annual data generation grows exponentially. At the same time, there is a high
demand to analyze that information quickly. In the past, every processor generation came …
demand to analyze that information quickly. In the past, every processor generation came …
Efficient classification of private memory blocks
BR Upadhyay, A Ros, J Shah - Journal of Parallel and Distributed …, 2021 - Elsevier
Shared memory architectures are pervasive in the multicore technology era. Still, sequential
and parallel applications use most of the data as private in a multicore system. Recent …
and parallel applications use most of the data as private in a multicore system. Recent …
TLB-based block-grain classification of private data
BR Upadhyay, A Ros, NS Murty - 2020 28th Euromicro …, 2020 - ieeexplore.ieee.org
Sequential and parallel applications use most of the data as private in a multi-core system.
Recent proposals made use of this observation to reduce the area of the coherence …
Recent proposals made use of this observation to reduce the area of the coherence …
Exploiting data locality in cache-coherent NUMA systems
I Sánchez Barrera - 2022 - upcommons.upc.edu
The end of Dennard scaling has caused a stagnation of the clock frequency in computers.
To overcome this issue, in the last two decades vendors have been integrating larger …
To overcome this issue, in the last two decades vendors have been integrating larger …
Towards resource-aware computing for task-based runtimes and parallel architectures
D Chasapis - 2019 - upcommons.upc.edu
Current large scale systems show increasing power demands, to the point that it has
become a huge strain on facilities and budgets. The increasing restrictions in terms of power …
become a huge strain on facilities and budgets. The increasing restrictions in terms of power …
Exploiting task-based programming models for resilience
L Jaulmes - 2019 - upcommons.upc.edu
Hardware errors become more common as silicon technologies shrink and become more
vulnerable, especially in memory cells, which are the most exposed to errors. Permanent …
vulnerable, especially in memory cells, which are the most exposed to errors. Permanent …
Runtime-assisted optimizations in the on-chip memory hierarchy
V Dimić - 2020 - upcommons.upc.edu
Following Moore's Law, the number of transistors on chip has been increasing
exponentially, which has led to the increasing complexity of modern processors. As a result …
exponentially, which has led to the increasing complexity of modern processors. As a result …
Application of a default shared state cache coherency protocol
M Malewicki, T McGee, MS Woodacre - US Patent 11,687,459, 2023 - Google Patents
Example implementations relate to cache coherency protocols as applied to a memory block
range. Exclusive ownership of a range of blocks of memory in a default shared state may be …
range. Exclusive ownership of a range of blocks of memory in a default shared state may be …