Task scheduling techniques for asymmetric multi-core systems
As performance and energy efficiency have become the main challenges for next-
generation high-performance computing, asymmetric multi-core architectures can provide …
generation high-performance computing, asymmetric multi-core architectures can provide …
MUSA: a multi-level simulation approach for next-generation HPC machines
T Grass, C Allande, A Armejach, A Rico… - SC'16: Proceedings …, 2016 - ieeexplore.ieee.org
The complexity of High Performance Computing (HPC) systems is increasing in the number
of components and their heterogeneity. Interactions between software and hardware involve …
of components and their heterogeneity. Interactions between software and hardware involve …
Reducing data movement on large shared memory systems by exploiting computation dependencies
Shared memory systems are becoming increasingly complex as they typically integrate
several storage devices. That brings different access latencies or bandwidth rates …
several storage devices. That brings different access latencies or bandwidth rates …
Architectural support for task dependence management with flexible software scheduling
The growing complexity of multi-core architectures has motivated a wide range of software
mechanisms to improve the orchestration of parallel executions. Task parallelism has …
mechanisms to improve the orchestration of parallel executions. Task parallelism has …
General purpose task-dependence management hardware for task-based dataflow programming models
Task-based programming models such as OpenMP, IntelTBB and OmpSs offer the
possibility of expressing dependences among tasks to drive their execution at runtime …
possibility of expressing dependences among tasks to drive their execution at runtime …
CATA: criticality aware task acceleration for multicore processors
Managing criticality in task-based programming models opens a wide range of performance
and power optimization opportunities in future manycore systems. Criticality aware task …
and power optimization opportunities in future manycore systems. Criticality aware task …
ATM: approximate task memoization in the runtime system
Redundant computations appear during the execution of real programs. Multiple factors
contribute to these unnecessary computations, such as repetitive inputs and patterns, calling …
contribute to these unnecessary computations, such as repetitive inputs and patterns, calling …
Reducing cache coherence traffic with a numa-aware runtime approach
P Caheny, L Alvarez, S Derradji… - … on Parallel and …, 2017 - ieeexplore.ieee.org
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the
benefits they provide for scaling core count and memory capacity. Also, the flat memory …
benefits they provide for scaling core count and memory capacity. Also, the flat memory …
Runtime-guided management of stacked DRAM memories in task parallel programs
Stacked DRAM memories have become a reality in High-Performance Computing (HPC)
architectures. These memories provide much higher bandwidth while consuming less power …
architectures. These memories provide much higher bandwidth while consuming less power …
Td-nuca: runtime driven management of nuca caches in task dataflow programming models
In high performance processors, the design of on-chip memory hierarchies is crucial for
performance and energy efficiency. Current processors rely on large shared Non-Uniform …
performance and energy efficiency. Current processors rely on large shared Non-Uniform …