[HTML][HTML] 15+ years of joint parallel application performance analysis/tools training with Scalasca/Score-P and Paraver/Extrae toolsets

BJN Wylie, J Giménez, C Feld, M Geimer, G Llort… - Future Generation …, 2024 - Elsevier
The diverse landscape of distributed heterogeneous computer systems currently available
and being created to address computational challenges with the highest performance …

Drom: Enabling efficient and effortless malleability for resource managers

M D'Amico, M Garcia-Gasulla, V López… - … Proceedings of the …, 2018 - dl.acm.org
In the design of future HPC systems, research in resource management is showing an
increasing interest in a more dynamic control of the available resources. It has been proven …

Software Development Vehicles to enable extended and early co-design: a RISC-V and HPC case of study

F Mantovani, P Vizcaino, F Banchelli… - … Conference on High …, 2023 - Springer
Prototyping HPC systems with low-to-mid technology readiness level (TRL) systems is
critical for providing feedback to hardware designers, the system software team (eg …

sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library)

P Valero-Lara, S Catalán, X Martorell, T Usui… - Journal of Parallel and …, 2020 - Elsevier
In this work we have implemented a novel Linear Algebra Library on top of the task-based
runtime OmpSs-2. We have used some of the most advanced OmpSs-2 features; weak …

Matching application signatures for performance predictions using a single execution

A Jayakumar, P Murali… - 2015 IEEE International …, 2015 - ieeexplore.ieee.org
Performance predictions for large problem sizes and processors using limited small scale
runs are useful for a variety of purposes including scalability projections, and help in …

Advanced performance analysis of HPC workloads on Cavium ThunderX

E Calore, F Mantovani, D Ruiz - 2018 International Conference …, 2018 - ieeexplore.ieee.org
The interest towards Arm based platforms as HPC solutions increased significantly during
the last 5 years. In this paper we show that, in contrast to the early days of pioneer tests …

Towards an auto-tuned and task-based spmv (lass library)

S Catalán, T Usui, L Toledo, X Martorell… - OpenMP: Portable Multi …, 2020 - Springer
We present a novel approach to parallelize the SpMV kernel included in LASs (Linear
Algebra routines on OmpSs) library, after a deep review and analysis of several well-known …

A portable coding strategy to exploit vectorization on combustion simulations

F Banchelli, G Oyarzun, M Garcia-Gasulla… - Computers & …, 2024 - Elsevier
The complexity of combustion simulations demands the latest high-performance computing
tools to accelerate its time-to-solution results. A current trend on HPC systems is the …

A fast solver for large tridiagonal systems on multi-core processors (lass library)

P Valero-Lara, D Andrade, R Sirvent, J Labarta… - IEEE …, 2019 - ieeexplore.ieee.org
Many problems of industrial and scientific interest require the solving of tridiagonal linear
systems. This paper presents several implementations for the parallel solving of large …

MPI+ OpenMP tasking scalability for multi-morphology simulations of the human brain

P Valero-Lara, R Sirvent, AJ Peña, J Labarta - Parallel Computing, 2019 - Elsevier
The simulation of the behavior of the human brain is one of the most ambitious challenges
today with a non-end of important applications. We can find many different initiatives in the …