Mapping parallelism to multi-cores: a machine learning based approach

Z Wang, MFP O'Boyle - Proceedings of the 14th ACM SIGPLAN …, 2009 - dl.acm.org
The efficient mapping of program parallelism to multi-core processors is highly dependent
on the underlying architecture. This paper proposes a portable and automatic compiler …

A framework for end-to-end simulation of high-performance computing systems

WE Denzel, J Li, P Walker, Y Jin - Simulation, 2010 - journals.sagepub.com
We present an end-to-end simulation framework that is capable of simulating High-
Performance Computing (HPC) systems with hundreds of thousands of interconnected …

Characteristics of workloads used in high performance and technical computing

R Cheveresan, M Ramsay, C Feucht… - Proceedings of the 21st …, 2007 - dl.acm.org
This paper provides a systematic comparison of various characteristics of computationally-
intensive workloads. Our analysis focuses on standard HPC benchmarks and representative …

An instrumentation approach for hardware-agnostic software characterization

A Anghel, LM Vasilescu, R Jongerius… - Proceedings of the 12th …, 2015 - dl.acm.org
Simulators and empirical profiling data are often used to understand how suitable a specific
hardware architecture is for an application. However, simulators can be slow, and empirical …

Predicting HPC parallel program performance based on LLVM compiler

W Zhang, M Hao, M Snir - Cluster Computing, 2017 - Springer
Performance prediction of parallel program plays key roles in many areas, such as parallel
system design, parallel program optimization, and parallel system procurement. Accurate …

Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems

F Blagojevic, DS Nikolopoulos, A Stamatakis… - Parallel Computing, 2007 - Elsevier
We explore runtime mechanisms and policies for scheduling dynamic multi-grain parallelism
on heterogeneous multi-core processors. Heterogeneous multi-core processors integrate …

Modeling multigrain parallelism on heterogeneous multi-core processors: A case study of the Cell BE

F Blagojevic, X Feng, KW Cameron… - … and Compilers: Third …, 2008 - Springer
Heterogeneous multi-core processors invest the most significant portion of their transistor
budget in customized “accelerator” cores, while using a small number of conventional low …

Software probes: Towards a quick method for machine characterization and application performance prediction

A Strube, D Rexachs, E Luque - 2008 International Symposium …, 2008 - ieeexplore.ieee.org
Computers perform different applications in different ways. To characterize an application
performance into a machine, the usual method is a throughout execution of it. This work is a …

[图书][B] A compile-time OpenMP cost model

C Liao - 2007 - search.proquest.com
OpenMP is a de facto API for parallel programming in C/C++ and Fortran on shared memory
and distributed shared memory platforms. It is also being increasingly used with MPI to form …

Scaling application properties to exascale

G Mariani, A Anghel, R Jongerius… - Proceedings of the 12th …, 2015 - dl.acm.org
Exascale computing systems will execute computationally intensive tasks on unprecedented
amounts of data. Tuning the design of such systems for a specific application or for an …