[HTML][HTML] A taxonomy of machine learning clustering algorithms, challenges, and future realms

S Pitafi, T Anwar, Z Sharif - Applied sciences, 2023 - mdpi.com
In the field of data mining, clustering has shown to be an important technique. Numerous
clustering methods have been devised and put into practice, and most of them locate high …

I/o access patterns in hpc applications: A 360-degree survey

JL Bez, S Byna, S Ibrahim - ACM Computing Surveys, 2023 - dl.acm.org
The high-performance computing I/O stack has been complex due to multiple software
layers, the inter-dependencies among these layers, and the different performance tuning …

Compute-efficient deep learning: Algorithmic trends and opportunities

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org
Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

Fluid: Dataset abstraction and elastic acceleration for cloud-native deep learning training jobs

R Gu, K Zhang, Z Xu, Y Che, B Fan… - 2022 IEEE 38th …, 2022 - ieeexplore.ieee.org
Nowdays, it is prevalent to train deep learning (DL) models in cloud-native platforms that
actively leverage containerization and orchestration technologies for high elasticity, low and …

Miso: exploiting multi-instance gpu capability on multi-tenant gpu clusters

B Li, T Patel, S Samsi, V Gadepally… - Proceedings of the 13th …, 2022 - dl.acm.org
GPU technology has been improving at an expedited pace in terms of size and performance,
empowering HPC and AI/ML researchers to advance the scientific discovery process …

Why globally re-shuffle? Revisiting data shuffling in large scale deep learning

TT Nguyen, F Trahay, J Domke, A Drozd… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural
Networks (DNN). SGD iterates the input data set in each training epoch processing data …

Plumber: Diagnosing and removing performance bottlenecks in machine learning data pipelines

M Kuchnik, A Klimovic, J Simsa… - Proceedings of …, 2022 - proceedings.mlsys.org
Input pipelines, which ingest and transform input data, are an essential part of training
Machine Learning (ML) models. However, it is challenging to implement efficient input …

{SHADE}: Enable Fundamental Cacheability for Distributed Deep Learning Training

RIS Khan, AH Yazdani, Y Fu, AK Paul, B Ji… - … USENIX Conference on …, 2023 - usenix.org
Deep learning training (DLT) applications exhibit unique I/O workload behaviors that pose
new challenges for storage system design. DLT is I/O intensive since data samples need to …

Gpu-enabled asynchronous multi-level checkpoint caching and prefetching

A Maurya, MM Rafique, T Tonellot, HJ AlSalem… - Proceedings of the …, 2023 - dl.acm.org
Checkpointing is an I/O intensive operation increasingly used by High-Performance
Computing (HPC) applications to revisit previous intermediate datasets at scale. Unlike the …

High Throughput Training of Deep Surrogates from Large Ensemble Runs

LT Meyer, M Schouler, RA Caulk, A Ribés… - Proceedings of the …, 2023 - dl.acm.org
Recent years have seen a surge in deep learning approaches to accelerate numerical
solvers, which provide faithful but computationally intensive simulations of the physical …