RWKV: Reinventing RNNs for the Transformer Era B Peng, E Alcaide, Q Anthony, A Albalak, S Arcadinho, S Biderman, ... Conference on Empirical Methods in Natural Language Processing, 2023 | 266 | 2023 |
RWKV: Reinventing RNNs for the Transformer Era B Peng, E Alcaide, Q Anthony, A Albalak, S Arcadinho, S Biderman, ... arXiv preprint arXiv:2305.13048, 2023 | 263 | 2023 |
Designing high-performance mpi libraries with on-the-fly compression for modern gpu clusters Q Zhou, C Chu, NS Kumar, P Kousha, SM Ghazimirsaeed, H Subramoni, ... 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021 | 27 | 2021 |
Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters Q Zhou, P Kousha, Q Anthony, KS Khorassani, A Shafi, H Subramoni, ... High Performance Computing: 37th International Conference, ISC High …, 2022 | 16 | 2022 |
Accelerating MPI all-to-all communication with online compression on modern GPU clusters Q Zhou, P Kousha, Q Anthony, K Shafie Khorassani, A Shafi, ... International Conference on High Performance Computing, 3-25, 2022 | 16 | 2022 |
Dynamic kernel fusion for bulk non-contiguous data transfer on gpu clusters CH Chu, KS Khorassani, Q Zhou, H Subramoni, DK Panda 2020 IEEE International Conference on Cluster Computing (CLUSTER), 130-141, 2020 | 8 | 2020 |
Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication Q Zhou, Q Anthony, L Xu, A Shafi, M Abduljabbar, H Subramoni, ... 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2023 | 7 | 2023 |
Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads Q Zhou, Q Anthony, A Shafi, H Subramoni, DKDK Panda 2022 IEEE 29th International Conference on High Performance Computing, Data …, 2022 | 5 | 2022 |
A hierarchical and load-aware design for large message neighborhood collectives SM Ghazimirsaeed, Q Zhou, A Ruhela, M Bayatpour, H Subramoni, ... SC20: International Conference for High Performance Computing, Networking …, 2020 | 5 | 2020 |
Designing Efficient Pipelined Communication Schemes using Compression in MPI Libraries B Ramesh, Q Zhou, A Shafi, M Abduljabbar, H Subramoni, DK Panda 2022 IEEE 29th International Conference on High Performance Computing, Data …, 2022 | 3 | 2022 |
MPI-xCCL: A Portable MPI Library over Collective Communication Libraries for Various Accelerators CC Chen, K Shafie Khorassani, P Kousha, Q Zhou, J Yao, H Subramoni, ... Proceedings of the SC'23 Workshops of The International Conference on High …, 2023 | 2 | 2023 |
Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters Q Zhou, B Ramesh, A Shafi, M Abduljabbar, H Subramoni, DK Panda ISC High Performance 2024 Research Paper Proceedings (39th International …, 2024 | | 2024 |
Accelerating Large Language Model Training with Hybrid GPU-based Compression L Xu, Q Anthony, Q Zhou, N Alnaasan, R Gulhane, A Shafi, H Subramoni IEEE/ACM, 2024 | | 2024 |
Benchmarking Modern Databases for Storing and Profiling Very Large Scale HPC Communication Data P Kousha, Q Zhou, H Subramoni, DK Panda International Symposium on Benchmarking, Measuring and Optimization, 104-119, 2023 | | 2023 |