Argobots: A Lightweight Low-Level Threading and Tasking Framework S Seo, A Amer, P Balaji, C Bordage, G Bosilca, A Brooks, P Carns, ... IEEE Transactions on Parallel and Distributed Systems, 2017 | 155 | 2017 |
MPI+ Threads: runtime contention and remedies A Amer, H Lu, Y Wei, P Balaji, S Matsuoka Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of …, 2015 | 62 | 2015 |
MPICH User’s Guide A Amer, P Balaji, W Bland, W Gropp, R Latham, H Lu, L Oden, AJ Pena, ... Version, 2015 | 50* | 2015 |
Why is MPI so slow? analyzing the fundamental limits in implementing MPI-3.1 K Raffenetti, A Amer, L Oden, C Archer, W Bland, H Fujita, Y Guo, ... Proceedings of the international conference for high performance computing …, 2017 | 38 | 2017 |
Advanced Thread Synchronization for Multithreaded MPI Implementations HV Dang, S Seo, A Amer, P Balaji Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud …, 2017 | 31 | 2017 |
BOLT: Optimizing OpenMP parallel regions with user-level threads S Iwasaki, A Amer, K Taura, S Seo, P Balaji 2019 28th International Conference on Parallel Architectures and Compilation …, 2019 | 27* | 2019 |
Fork-join and data-driven execution models on multi-core architectures: Case study of the FMM A Amer, N Maruyama, M Pericàs, K Taura, R Yokota, S Matsuoka Supercomputing: 28th International Supercomputing Conference, ISC 2013 …, 2013 | 26 | 2013 |
Systemwide power management with Argo D Ellsworth, T Patki, S Perarnau, S Seo, A Amer, J Zounmevo, R Gupta, ... 2016 IEEE International Parallel and Distributed Processing Symposium …, 2016 | 25 | 2016 |
Characterizing MPI and Hybrid MPI+Threads Applications at Scale: Case Study with BFS A Amer, H Lu, P Balaji, S Matsuoka Workshop on Parallel Programming Model for the Masses (PPMM 2015) in …, 0 | 22* | |
An efficient abortable-locking protocol for multi-level NUMA systems M Chabbi, A Amer, S Wen, X Liu Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of …, 2017 | 17 | 2017 |
Locking Aspects in Multithreaded MPI Implementations A AMER, H LU, Y WEI, J HAMMOND, S MATSUOKA, P BALAJI | 13* | 2016 |
Analysis of data reuse in task-parallel runtimes M Pericàs, A Amer, K Taura, S Matsuoka High Performance Computing Systems. Performance Modeling, Benchmarking and …, 2014 | 13 | 2014 |
Software combining to mitigate multithreaded MPI contention A Amer, C Archer, M Blocksome, C Cao, M Chuvelev, H Fujita, ... Proceedings of the ACM International Conference on Supercomputing, 367-379, 2019 | 12 | 2019 |
Lock Contention Management in Multithreaded MPI A Amer, H Lu, P Balaji, M Chabbi, Y Wei, J Hammond, S Matsuoka ACM Transations on Parallel Computing 5 (3), 12:1--12:21, 2019 | 11 | 2019 |
Using Bittorrent and SVC for efficient video sharing and streaming A Abdelhalim, T Ahmed, H Walid-Khaled, S Matsuoka 2012 IEEE Symposium on Computers and Communications (ISCC), 000537-000543, 2012 | 11 | 2012 |
Lessons learned from analyzing dynamic promotion for user-level threading S Iwasaki, A Amer, K Taura, P Balaji Proceedings of the International Conference for High Performance Computing …, 2018 | 8 | 2018 |
Towards a Dataflow FMM using the OmpSs Programming Model P Miquel, A Abdelhalim, F Keisuke, M Naoya, Y Rio, M Satoshi 136th IPSJ Conference on High Performance Computing 2012 (12), 1-7, 2012 | 8* | 2012 |
Analyzing the performance trade-off in implementing user-level threads S Iwasaki, A Amer, K Taura, P Balaji IEEE Transactions on Parallel and Distributed Systems 31 (8), 1859-1877, 2020 | 7 | 2020 |
Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures A Amer, S Matsuoka, M Pericàs, N Maruyama, K Taura, R Yokota, P Balaji International Workshop on OpenMP, 156-170, 2016 | 6 | 2016 |
Efficient abortable-locking protocol for multi-level NUMA systems: Design and correctness M Chabbi, A Amer, X Liu ACM Transactions on Parallel Computing (TOPC) 7 (3), 1-32, 2020 | 3 | 2020 |