A performance prediction model for spark applications
MU Javaid, AA Kanoun, F Demesmaeker… - … Conference on Big Data, 2020 - Springer
Apache Spark is a popular open-source distributed processing framework that enables
efficient processing of massive amounts of data. It has a large number of configuration …
efficient processing of massive amounts of data. It has a large number of configuration …
Improving spark application throughput via memory aware task co-location: A mixture of experts approach
Data analytic applications built upon big data processing frameworks such as Apache Spark
are an important class of applications. Many of these applications are not latency-sensitive …
are an important class of applications. Many of these applications are not latency-sensitive …
Machine learning for performance prediction of spark cloud applications
Big data applications and analytics are employed in many sectors for a variety of goals:
improving customers satisfaction, predicting market behavior or improving processes in …
improving customers satisfaction, predicting market behavior or improving processes in …
Debugging Big Data Analytics in Spark with BigDebug
To process massive quantities of data, developers leverage Data-Intensive Scalable
Computing (DISC) systems such as Apache Spark. In terms of debugging, DISC systems …
Computing (DISC) systems such as Apache Spark. In terms of debugging, DISC systems …
Amazon sagemaker automatic model tuning: Scalable gradient-free optimization
Tuning complex machine learning systems is challenging. Machine learning typically
requires to set hyperparameters, be it regularization, architecture, or optimization …
requires to set hyperparameters, be it regularization, architecture, or optimization …
HyperSpark: A data-intensive programming environment for parallel metaheuristics
Metaheuristics are search procedures used to solve complex, often intractable problems for
which other approaches are unsuitable or unable to provide solutions in reasonable times …
which other approaches are unsuitable or unable to provide solutions in reasonable times …
Alchemist: An Apache Spark⇔ MPI interface
Summary The Apache Spark framework for distributed computation is popular in the data
analytics community due to its ease of use, but its MapReduce‐style programming model …
analytics community due to its ease of use, but its MapReduce‐style programming model …
Learning Surrogates for Offline Black-Box Optimization via Gradient Matching
Offline design optimization problem arises in numerous science and engineering
applications including material and chemical design, where expensive online …
applications including material and chemical design, where expensive online …
Sparknet: Training deep networks in spark
Training deep networks is a time-consuming process, with networks for object recognition
often requiring multiple days to train. For this reason, leveraging the resources of a cluster to …
often requiring multiple days to train. For this reason, leveraging the resources of a cluster to …
TensorLightning: A traffic-efficient distributed deep learning on commodity spark clusters
With the recent success of deep learning, the amount of data and computation continues to
grow daily. Hence a distributed deep learning system that shares the training workload has …
grow daily. Hence a distributed deep learning system that shares the training workload has …