rCUDA: Reducing the number of GPU-based accelerators in high performance clusters J Duato, AJ Pena, F Silla, R Mayo, ES Quintana-Ortí 2010 International Conference on High Performance Computing & Simulation …, 2010 | 402 | 2010 |
A Complete and Efficient CUDA-Sharing Solution for HPC Clusters AJ Peña, C Reaño, F Silla, R Mayo, ES Quintana-Ortí, J Duato Parallel Computing 40 (10), 574-588, 2014 | 134 | 2014 |
Chai: collaborative heterogeneous applications for integrated-architectures J Gómez-Luna, I El Hajj, LW Chang, V Garcıa-Flores, SG de Gonzalo, ... 2017 IEEE International Symposium on Performance Analysis of Systems and …, 2017 | 113 | 2017 |
Enabling CUDA acceleration within virtual machines using rCUDA J Duato, AJ Pena, F Silla, JC Fernandez, R Mayo, ES Quintana-Orti High Performance Computing (HiPC), 2011 18th International Conference on, 1-10, 2011 | 110 | 2011 |
An efficient implementation of GPU virtualization in high performance clusters J Duato, FD Igual, R Mayo, AJ Peña, ES Quintana-Ortí, F Silla Euro-Par 2009–Parallel Processing Workshops, 385-394, 2010 | 96 | 2010 |
Performance evaluation of cudnn convolution algorithms on nvidia volta gpus M Jorda, P Valero-Lara, AJ Pena IEEE Access 7, 70461-70473, 2019 | 72 | 2019 |
MPICH User’s Guide A Amer, P Balaji, W Bland, W Gropp, R Latham, H Lu, L Oden, AJ Pena, ... Version, 2015 | 71* | 2015 |
Performance of CUDA virtualized remote GPUs in high performance clusters J Duato, AJ Pena, F Silla, R Mayo, ES Quintana-Orti Parallel Processing (ICPP), 2011 International Conference on, 365-374, 2011 | 70 | 2011 |
MT-MPI: multithreaded MPI for many-core environments M Si, AJ Peña, P Balaji, M Takagi, Y Ishikawa Proceedings of the 28th ACM international conference on Supercomputing, 125-134, 2014 | 68 | 2014 |
Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures M Si, AJ Pena, J Hammond, P Balaji, M Takagi, Y Ishikawa 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2015 | 60 | 2015 |
Automating the Application Data Placement in Hybrid Memory Systems H Servat, AJ Pena, G Llort, E Mercadal, HC Hoppe, J Labarta Cluster Computing (CLUSTER), 2017 IEEE International Conference on, 126-136, 2017 | 58 | 2017 |
Toward the Efficient Use of Multiple Explicitly Managed Memory Subsystems AJ Pena, P Balaji IEEE Cluster 2014, 2014 | 52 | 2014 |
CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution C Reaño, AJ Peña, F Silla, J Duato, R Mayo, ES Quintana-Orti High Performance Computing (HiPC), 2012 19th International Conference on, 1-10, 2012 | 51 | 2012 |
Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization C Reano, R Mayo, ES Quintana-Ortı, F Silla, J Duato, AJ Pena IEEE Cluster 2013, 2013 | 48 | 2013 |
Integrating blocking and non-blocking MPI primitives with task-based programming models K Sala, X Teruel, JM Perez, AJ Peña, V Beltran, J Labarta Parallel Computing 85, 153-166, 2019 | 45 | 2019 |
Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications V Garcıa, J Gomez-Luna, T Grass, A Rico, E Ayguade, AJ Pena 2016 IEEE International Symposium on Workload Characterization (IISWC), 1-10, 2016 | 44 | 2016 |
Exploring the Vision Processing Unit as Co-Processor for Inference S Rivas-Gomez, AJ Pena, D Moloney, E Laure, S Markidis 2018 IEEE International Parallel and Distributed Processing Symposium …, 2018 | 42 | 2018 |
MultiCL: Enabling Automatic Scheduling for Task-Parallel Workloads in OpenCL AM Aji, AJ Peña, P Balaji, W Feng Parallel Computing 58, 37-55, 2016 | 35 | 2016 |
Enabling homomorphically encrypted inference for large dnn models G Lloret-Talavera, M Jorda, H Servat, F Boemer, C Chauhan, ... IEEE Transactions on Computers 71 (5), 1145-1155, 2021 | 31 | 2021 |
DMRlib: easy-coding and efficient resource management for job malleability S Iserte, R Mayo, ES Quintana-Ortí, AJ Pena IEEE Transactions on Computers 70 (9), 1443-1457, 2020 | 31 | 2020 |