A performance prediction model for spark applications

MU Javaid, AA Kanoun, F Demesmaeker… - … Conference on Big Data, 2020 - Springer
Apache Spark is a popular open-source distributed processing framework that enables
efficient processing of massive amounts of data. It has a large number of configuration …

Improving spark application throughput via memory aware task co-location: A mixture of experts approach

VS Marco, B Taylor, B Porter, Z Wang - … of the 18th ACM/IFIP/USENIX …, 2017 - dl.acm.org
Data analytic applications built upon big data processing frameworks such as Apache Spark
are an important class of applications. Many of these applications are not latency-sensitive …

Machine learning for performance prediction of spark cloud applications

A Maros, F Murai, APC da Silva… - 2019 IEEE 12th …, 2019 - ieeexplore.ieee.org
Big data applications and analytics are employed in many sectors for a variety of goals:
improving customers satisfaction, predicting market behavior or improving processes in …

Debugging Big Data Analytics in Spark with BigDebug

MA Gulzar, M Interlandi, T Condie, M Kim - Proceedings of the 2017 …, 2017 - dl.acm.org
To process massive quantities of data, developers leverage Data-Intensive Scalable
Computing (DISC) systems such as Apache Spark. In terms of debugging, DISC systems …

Amazon sagemaker automatic model tuning: Scalable gradient-free optimization

V Perrone, H Shen, A Zolic, I Shcherbatyi… - Proceedings of the 27th …, 2021 - dl.acm.org
Tuning complex machine learning systems is challenging. Machine learning typically
requires to set hyperparameters, be it regularization, architecture, or optimization …

HyperSpark: A data-intensive programming environment for parallel metaheuristics

M Ciavotta, S Krstić, DA Tamburri… - … Congress on Big …, 2019 - ieeexplore.ieee.org
Metaheuristics are search procedures used to solve complex, often intractable problems for
which other approaches are unsuitable or unable to provide solutions in reasonable times …

Alchemist: An Apache Spark⇔ MPI interface

A Gittens, K Rothauge, S Wang… - Concurrency and …, 2019 - Wiley Online Library
Summary The Apache Spark framework for distributed computation is popular in the data
analytics community due to its ease of use, but its MapReduce‐style programming model …

Learning Surrogates for Offline Black-Box Optimization via Gradient Matching

M Hoang, A Fadhel, A Deshwal, J Doppa… - Forty-first International …, 2024 - openreview.net
Offline design optimization problem arises in numerous science and engineering
applications including material and chemical design, where expensive online …

Sparknet: Training deep networks in spark

P Moritz, R Nishihara, I Stoica, MI Jordan - arXiv preprint arXiv:1511.06051, 2015 - arxiv.org
Training deep networks is a time-consuming process, with networks for object recognition
often requiring multiple days to train. For this reason, leveraging the resources of a cluster to …

TensorLightning: A traffic-efficient distributed deep learning on commodity spark clusters

S Lee, H Kim, J Park, J Jang, CS Jeong, S Yoon - IEEE Access, 2018 - ieeexplore.ieee.org
With the recent success of deep learning, the amount of data and computation continues to
grow daily. Hence a distributed deep learning system that shares the training workload has …