[HTML][HTML] 15+ years of joint parallel application performance analysis/tools training with Scalasca/Score-P and Paraver/Extrae toolsets
BJN Wylie, J Giménez, C Feld, M Geimer, G Llort… - Future Generation …, 2024 - Elsevier
The diverse landscape of distributed heterogeneous computer systems currently available
and being created to address computational challenges with the highest performance …
and being created to address computational challenges with the highest performance …
Drom: Enabling efficient and effortless malleability for resource managers
In the design of future HPC systems, research in resource management is showing an
increasing interest in a more dynamic control of the available resources. It has been proven …
increasing interest in a more dynamic control of the available resources. It has been proven …
Software Development Vehicles to enable extended and early co-design: a RISC-V and HPC case of study
Prototyping HPC systems with low-to-mid technology readiness level (TRL) systems is
critical for providing feedback to hardware designers, the system software team (eg …
critical for providing feedback to hardware designers, the system software team (eg …
sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library)
In this work we have implemented a novel Linear Algebra Library on top of the task-based
runtime OmpSs-2. We have used some of the most advanced OmpSs-2 features; weak …
runtime OmpSs-2. We have used some of the most advanced OmpSs-2 features; weak …
Matching application signatures for performance predictions using a single execution
A Jayakumar, P Murali… - 2015 IEEE International …, 2015 - ieeexplore.ieee.org
Performance predictions for large problem sizes and processors using limited small scale
runs are useful for a variety of purposes including scalability projections, and help in …
runs are useful for a variety of purposes including scalability projections, and help in …
Advanced performance analysis of HPC workloads on Cavium ThunderX
E Calore, F Mantovani, D Ruiz - 2018 International Conference …, 2018 - ieeexplore.ieee.org
The interest towards Arm based platforms as HPC solutions increased significantly during
the last 5 years. In this paper we show that, in contrast to the early days of pioneer tests …
the last 5 years. In this paper we show that, in contrast to the early days of pioneer tests …
Towards an auto-tuned and task-based spmv (lass library)
We present a novel approach to parallelize the SpMV kernel included in LASs (Linear
Algebra routines on OmpSs) library, after a deep review and analysis of several well-known …
Algebra routines on OmpSs) library, after a deep review and analysis of several well-known …
A portable coding strategy to exploit vectorization on combustion simulations
F Banchelli, G Oyarzun, M Garcia-Gasulla… - Computers & …, 2024 - Elsevier
The complexity of combustion simulations demands the latest high-performance computing
tools to accelerate its time-to-solution results. A current trend on HPC systems is the …
tools to accelerate its time-to-solution results. A current trend on HPC systems is the …
A fast solver for large tridiagonal systems on multi-core processors (lass library)
Many problems of industrial and scientific interest require the solving of tridiagonal linear
systems. This paper presents several implementations for the parallel solving of large …
systems. This paper presents several implementations for the parallel solving of large …
MPI+ OpenMP tasking scalability for multi-morphology simulations of the human brain
The simulation of the behavior of the human brain is one of the most ambitious challenges
today with a non-end of important applications. We can find many different initiatives in the …
today with a non-end of important applications. We can find many different initiatives in the …