The design and performance of batched BLAS on modern high-performance computing systems J Dongarra, S Hammarling, NJ Higham, SD Relton, P Valero-Lara, ... Procedia Computer Science 108, 495-504, 2017 | 86 | 2017 |
Performance evaluation of cudnn convolution algorithms on nvidia volta gpus M Jorda, P Valero-Lara, AJ Pena IEEE Access 7, 70461-70473, 2019 | 72 | 2019 |
Accelerating fluid–solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures P Valero-Lara, FD Igual, M Prieto-Matías, A Pinelli, J Favier Journal of Computational Science 10, 249-261, 2015 | 51 | 2015 |
Fast finite difference Poisson solvers on heterogeneous architectures P Valero-Lara, A Pinelli, M Prieto-Matias Computer Physics Communications 185 (4), 1265-1272, 2014 | 47 | 2014 |
Heterogeneous CPU+ GPU approaches for mesh refinement over Lattice‐Boltzmann simulations P Valero‐Lara, J Jansson Concurrency and Computation: Practice and Experience 29 (7), e3919, 2017 | 37 | 2017 |
A proposed API for batched basic linear algebra subprograms J Dongarra, I Duff, M Gates, A Haidar, S Hammarling, NJ Higham, J Hogg, ... Manchester Institute for Mathematical Sciences, University of Manchester, 2016 | 33 | 2016 |
Accelerating solid-fluid interaction using lattice-boltzmann and immersed boundary coupled simulations on heterogeneous platforms P Valero-Lara, A Pinelli, M Prieto-Matias Procedia Computer Science 29, 50-61, 2014 | 32 | 2014 |
cuThomasBatch and cuThomasVBatch, CUDA routines to compute batch of tridiagonal systems on NVIDIA GPUs P Valero‐Lara, I Martínez‐Pérez, R Sirvent, X Martorell, AJ Peña Concurrency and Computation: Practice and Experience 30 (24), e4909, 2018 | 31 | 2018 |
Block tridiagonal solvers on heterogeneous architectures P Valero-Lara, A Pinelli, J Favier, MP Matias 2012 IEEE 10th International Symposium on Parallel and Distributed …, 2012 | 31 | 2012 |
cuHinesBatch: Solving multiple Hines systems on GPUs human brain project P Valero-Lara, I Martínez-Perez, AJ Pena, X Martorell, R Sirvent, ... Procedia Computer Science 108, 566-575, 2017 | 29 | 2017 |
Accelerating solid–fluid interaction based on the immersed boundary method on multicore and gpu architectures P Valero-Lara The Journal of Supercomputing 70 (2), 799-815, 2014 | 26 | 2014 |
Multi-GPU acceleration of DARTEL (early detection of Alzheimer) P Valero-Lara 2014 IEEE International Conference on Cluster Computing (CLUSTER), 346-354, 2014 | 24 | 2014 |
Similarity search implementations for multi-core and many-core processors R Uribe-Paredes, P Valero-Lara, E Arias, JL Sánchez, D Cazorla 2011 International Conference on High Performance Computing & Simulation …, 2011 | 24 | 2011 |
NVIDIA GPUs scalability to solve multiple (batch) tridiagonal systems implementation of cuthomasbatch P Valero-Lara, I Martínez-Pérez, R Sirvent, X Martorell, AJ Peña International Conference on Parallel Processing and Applied Mathematics, 243-253, 2017 | 23 | 2017 |
A non-uniform Staggered Cartesian grid approach for Lattice-Boltzmann method P Valero-Lara, J Jansson Procedia Computer Science 51, 296-305, 2015 | 23 | 2015 |
Reducing memory requirements for large size LBM simulations on GPUs P Valero‐Lara Concurrency and Computation: Practice and Experience 29 (24), e4221, 2017 | 22 | 2017 |
Variable batched DGEMM P Valero-Lara, I Martínez-Pérez, S Mateo, R Sirvent, V Beltran, X Martorell, ... 2018 26th Euromicro International Conference on Parallel, Distributed and …, 2018 | 20 | 2018 |
Many-task computing on many-core architectures P Valero-Lara, P Nookala, FL Pelayo, J Jansson, S Dimitropoulos, I Raicu Scalable Computing: Practice and Experience 17 (1), 32-46, 2016 | 19 | 2016 |
A gpu-based implementation for range queries on spaghettis data structure R Uribe-Paredes, P Valero-Lara, E Arias, JL Sánchez, D Cazorla Computational Science and Its Applications-ICCSA 2011: International …, 2011 | 18 | 2011 |
sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library) P Valero-Lara, S Catalán, X Martorell, T Usui, J Labarta Journal of Parallel and Distributed Computing 138, 153-171, 2020 | 17 | 2020 |