Optimizing load balancing and data-locality with data-aware scheduling

K Wang, X Zhou, T Li, D Zhao, M Lang… - … Conference on Big …, 2014 - ieeexplore.ieee.org
Load balancing techniques (eg work stealing) are important to obtain the best performance
for distributed task scheduling systems that have multiple schedulers making scheduling …

Fusionfs: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems

D Zhao, Z Zhang, X Zhou, T Li, K Wang… - … conference on big …, 2014 - ieeexplore.ieee.org
State-of-the-art, yet decades-old, architecture of high-performance computing systems has
its compute and storage resources separated. It thus is limited for modern data-intensive …

Flux: A next-generation resource management framework for large HPC centers

DH Ahn, J Garlick, M Grondona, D Lipari… - 2014 43rd …, 2014 - ieeexplore.ieee.org
Resource and job management software is crucial to High Performance Computing (HPC)
for efficient application execution. However, current systems and approaches can no longer …

PapyrusKV: A high-performance parallel key-value store for distributed NVM architectures

J Kim, S Lee, JS Vetter - Proceedings of the International Conference for …, 2017 - dl.acm.org
This paper introduces PapyrusKV, a parallel embedded key-value store (KVS) for distributed
high-performance computing (HPC) architectures that offer potentially massive pools of …

I/O-aware batch scheduling for petascale computing systems

Z Zhou, X Yang, D Zhao, P Rich, W Tang… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
In the Big Data era, the gap between the storage performance and an application's I/O
requirement is increasing. I/O congestion caused by concurrent storage accesses from …

Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales

K Wang, K Qiao, I Sadooghi, X Zhou… - Concurrency and …, 2016 - Wiley Online Library
Data‐driven programming models such as many‐task computing (MTC) have been
prevalent for running data‐intensive scientific applications. MTC applies over …

Next generation job management systems for extreme-scale ensemble computing

K Wang, X Zhou, H Chen, M Lang, I Raicu - Proceedings of the 23rd …, 2014 - dl.acm.org
With the exponential growth of supercomputers in parallelism, applications are growing
more diverse, including traditional large-scale HPC MPI jobs, and ensemble workloads such …

Overcoming hadoop scaling limitations through distributed task execution

K Wang, N Liu, I Sadooghi, X Yang… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
Data driven programming models like MapReduce have gained the popularity in large-scale
data processing. Although great efforts through the Hadoop implementation and framework …

Towards scalable distributed workload manager with monitoring-based weakly consistent resource stealing

K Wang, X Zhou, K Qiao, M Lang… - Proceedings of the 24th …, 2015 - dl.acm.org
One way to efficiently utilize the coming exascale machines is to support a mixture of
applications in various domains, such as traditional large-scale HPC, the ensemble runs …

Exploring the design tradeoffs for extreme-scale high-performance computing system software

K Wang, A Kulkarni, M Lang, D Arnold… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
Owing to the extreme parallelism and the high component failure rates of tomorrow's
exascale, high-performance computing (HPC) system software will need to be scalable …