Profitable loop fusion and tiling using model-driven empirical search A Qasem, K Kennedy Proceedings of the 20th annual international conference on Supercomputing …, 2006 | 76 | 2006 |
Automatic tuning of whole applications using direct search and a performance-based transformation system A Qasem, K Kennedy, J Mellor-Crummey The Journal of Supercomputing 36 (2), 183-196, 2006 | 67 | 2006 |
Understanding stencil code performance on multicore architectures SMF Rahman, Q Yi, A Qasem Proceedings of the 8th ACM International Conference on Computing Frontiers, 1-10, 2011 | 65 | 2011 |
Maximizing hardware prefetch effectiveness with machine learning S Rahman, M Burtscher, Z Zong, A Qasem 2015 IEEE 17th International Conference on High Performance Computing and …, 2015 | 50 | 2015 |
Automatic restructuring of GPU kernels for exploiting inter-thread data locality S Unkule, C Shaltz, A Qasem Compiler Construction: 21st International Conference, CC 2012, Held as Part …, 2012 | 46 | 2012 |
Improving performance with integrated program transformations A Qasem, G Jin, J Mellor-Crummey manuscript, October, 2003 | 36 | 2003 |
Exploring the optimization space of dense linear algebra kernels Q Yi, A Qasem Languages and Compilers for Parallel Computing: 21th International Workshop …, 2008 | 27 | 2008 |
A module-based approach to adopting the 2013 acm curricular recommendations on parallel computing M Burtscher, W Peng, A Qasem, H Shi, D Tamir, H Thiry Proceedings of the 46th ACM technical symposium on computer science …, 2015 | 19 | 2015 |
A cache-conscious profitability model for empirical tuning of loop fusion A Qasem, K Kennedy International Workshop on Languages and Compilers for Parallel Computing …, 2005 | 19 | 2005 |
Automatic tuning of scientific applications A Qasem Rice University, 2007 | 14 | 2007 |
Evaluating a model for cache conflict miss prediction A Qasem, K Kennedy Technical Report CS-TR05-457, Rice University, 2005 | 14 | 2005 |
Balancing locality and parallelism on shared-cache mulit-core systems MJ Cade, A Qasem 2009 11th IEEE International Conference on High Performance Computing and …, 2009 | 13 | 2009 |
A module-based introduction to heterogeneous computing in core courses A Qasem, DP Bunde, P Schielke Journal of Parallel and Distributed Computing 158, 56-66, 2021 | 12 | 2021 |
An Evaluation of Parallel Knapsack Algorithms on Multicore Architectures. H Rashid, C Novoa, A Qasem CSC 1, 230-235, 2010 | 12 | 2010 |
Automatically selecting profitable thread block sizes for accelerated kernels TA Connors, A Qasem 2017 IEEE 19th International Conference on High Performance Computing and …, 2017 | 11 | 2017 |
Characterizing data organization effects on heterogeneous memory architectures A Qasem, AM Aji, G Rodgers 2017 IEEE/ACM International Symposium on Code Generation and Optimization …, 2017 | 11 | 2017 |
A SIMD solution for the quadratic assignment problem with GPU acceleration A Chaparala, C Novoa, A Qasem Proceedings of the 2014 Annual Conference on Extreme Science and Engineering …, 2014 | 11 | 2014 |
A SIMD tabu search implementation for solving the quadratic assignment problem with GPU acceleration C Novoa, A Qasem, A Chaparala Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by …, 2015 | 10 | 2015 |
A case for compiler-driven superpage allocation J Magee, A Qasem Proceedings of the 47th Annual Southeast Regional Conference, 1-4, 2009 | 9 | 2009 |
Evaluating the role of optimization-specific search heuristics in effective autotuning J Guo, Q Yi, A Qasem Department of Computer Science, University of Texas at San Antonio, 2010 | 8 | 2010 |