Statistical structure analysis in MRI brain tumor segmentation

X Xuan, Q Liao - … Conference on Image and Graphics (ICIG …, 2007 - ieeexplore.ieee.org
Automated MRI (Magnetic Resonance Imaging) brain tumor segmentation is a difficult task
due to the variance and complexity of tumors. In this paper, a statistical structure analysis …

LaRIS: targeting portability and productivity for lapack codes on extreme heterogeneous systems by using iris

MAH Monil, NR Miniskar, FY Liu… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
In keeping with the trend of heterogeneity in high-performance computing, hardware
manufacturers and vendors are developing new architectures and associated software …

Adaptive parallel applications: from shared memory architectures to fog computing (2002–2022)

G Galante, R da Rosa Righi - Cluster Computing, 2022 - Springer
The evolution of parallel architectures points to dynamic environments where the number of
available resources or configurations may vary during the execution of applications. This …

sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library)

P Valero-Lara, S Catalán, X Martorell, T Usui… - Journal of Parallel and …, 2020 - Elsevier
In this work we have implemented a novel Linear Algebra Library on top of the task-based
runtime OmpSs-2. We have used some of the most advanced OmpSs-2 features; weak …

Machine-Learning-Driven Runtime Optimization of BLAS Level 3 on Modern Multi-Core Systems

Y Xia, GMJ Barca - 2024 IEEE International Parallel and …, 2024 - ieeexplore.ieee.org
BLAS Level 3 operations are essential for scientific computing, but finding the optimal
number of threads for multi-threaded implementations on modern multi-core systems is …

Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors

R Rodríguez-Sánchez, A Castelló… - … Journal of High …, 2024 - journals.sagepub.com
Malleability is defined as the ability to vary the degree of parallelism at runtime, and is
regarded as a means to improve core occupation on state-of-the-art multicore processors …

A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication

Y Xia, M De La Pierre, AS Barnard… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
The GEneral Matrix Multiplication (GEMM) is one of the essential algorithms in scientific
computing. Single-thread GEMM implementations are well-optimised with techniques like …

Towards a malleable tensorflow implementation

LA Libutti, FD Igual, L Piñuel, L De Giusti… - Conference on Cloud …, 2020 - Springer
The TensorFlow framework was designed since its inception to provide multi-thread
capabilities, extended with hardware accelerator support to leverage the potential of modern …

Static versus dynamic task scheduling of the lu factorization on ARM big. LITTLE architectures

S Catalán, R Rodríguez-Sánchez… - 2017 IEEE …, 2017 - ieeexplore.ieee.org
We investigate several parallel algorithmic variants of the LU factorization with partial
pivoting (LUpp) that trade off the exploitation of increasing levels of task-parallelism in …

Programming parallel dense matrix factorizations with look-ahead and OpenMP

S Catalán, A Castelló, FD Igual… - Cluster …, 2020 - Springer
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms,
using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts …