Architectural support for task dependence management with flexible software scheduling
The growing complexity of multi-core architectures has motivated a wide range of software
mechanisms to improve the orchestration of parallel executions. Task parallelism has …
mechanisms to improve the orchestration of parallel executions. Task parallelism has …
ATM: approximate task memoization in the runtime system
Redundant computations appear during the execution of real programs. Multiple factors
contribute to these unnecessary computations, such as repetitive inputs and patterns, calling …
contribute to these unnecessary computations, such as repetitive inputs and patterns, calling …
Runtime-guided management of stacked DRAM memories in task parallel programs
Stacked DRAM memories have become a reality in High-Performance Computing (HPC)
architectures. These memories provide much higher bandwidth while consuming less power …
architectures. These memories provide much higher bandwidth while consuming less power …
Td-nuca: runtime driven management of nuca caches in task dataflow programming models
In high performance processors, the design of on-chip memory hierarchies is crucial for
performance and energy efficiency. Current processors rely on large shared Non-Uniform …
performance and energy efficiency. Current processors rely on large shared Non-Uniform …
RADAR: Runtime-assisted dead region management for last-level caches
M Manivannan, V Papaefstathiou… - … Symposium on High …, 2016 - ieeexplore.ieee.org
Last-level caches (LLCs) bridge the processor/memory speed gap and reduce energy
consumed per access. Unfortunately, LLCs are poorly utilized because of the relatively large …
consumed per access. Unfortunately, LLCs are poorly utilized because of the relatively large …
ParalOS: A scheduling & memory management framework for heterogeneous VPUs
Embedded systems are presented today with the challenge of a very rapidly evolving
application diversity followed by increased programming and computational complexity …
application diversity followed by increased programming and computational complexity …
Runtime-assisted cache coherence deactivation in task parallel programs
With increasing core counts, the scalability of directory-based cache coherence has become
a challenging problem. To reduce the area and power needs of the directory, recent …
a challenging problem. To reduce the area and power needs of the directory, recent …
Design-time memory subsystem optimization for low-power multi-core embedded systems
M Strobel, M Radetzki - … multicore/many-core systems-on-chip …, 2019 - ieeexplore.ieee.org
Embedded multi-core systems are increasingly in use. As established single-core design
methodologies are often not applicable out of the box, novel design-time optimization …
methodologies are often not applicable out of the box, novel design-time optimization …
Explicit data layout management for autotuning exploration on complex memory topologies
The memory topology of high-performance computing platforms is becoming more complex.
Future exascale platforms in particular are expected to feature multiple types of memory …
Future exascale platforms in particular are expected to feature multiple types of memory …
A visual analysis on recognizability and discriminability of onomatopoeia words with DCNN features
In this paper, we examine the relation between onomatopoeia and images using a large
number of Web images. The objective of this paper is to examine if the images …
number of Web images. The objective of this paper is to examine if the images …