Hyperdrive: Exploring hyperparameters with pop scheduling

J Rasley, Y He, F Yan, O Ruwase… - Proceedings of the 18th …, 2017 - dl.acm.org
The quality of machine learning (ML) and deep learning (DL) models are very sensitive to
many different adjustable parameters that are set before training even begins, commonly …

Cost-effective resource provisioning for spark workloads

Y Chen, J Lu, C Chen, M Hoque… - Proceedings of the 28th …, 2019 - dl.acm.org
Spark is one of the prevalent big data analytical platforms. Configuring proper resource
provision for Spark jobs is challenging but essential for organizations to save time, achieve …

Flare: Optimizing Apache Spark with Native Compilation for {Scale-Up} Architectures and {Medium-Size} Data

G Essertel, R Tahboub, J Decker, K Brown… - … USENIX Symposium on …, 2018 - usenix.org
In recent years, Apache Spark has become the de facto standard for big data processing.
Spark has enabled a wide audience of users to process petabyte-scale workloads due to its …

SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics

M Li, J Tan, Y Wang, L Zhang, V Salapura - Cluster Computing, 2017 - Springer
Spark has been increasingly employed by industries for big data analytics recently, due to its
resilience, scalability and efficient in-memory distributed programming model. Meanwhile …

Model averaging in distributed machine learning: a case study with Apache Spark

Y Guo, Z Zhang, J Jiang, W Wu, C Zhang, B Cui, J Li - The VLDB Journal, 2021 - Springer
The increasing popularity of Apache Spark has attracted many users to put their data into its
ecosystem. On the other hand, it has been witnessed in the literature that Spark is slow …

Elastic executor provisioning for iterative workloads on apache spark

D Yang, W Rang, D Cheng, Y Wang… - … Conference on Big …, 2019 - ieeexplore.ieee.org
In memory data analytic frameworks like Apache Spark are employed by an increasing
number of diverse applications-such as machine learning, graph computation, and scientific …

Optimizing performance of Real-Time Big Data stateful streaming applications on Cloud

A Gupta, S Jain - 2022 IEEE International Conference on Big …, 2022 - ieeexplore.ieee.org
Exponential growth in the volume of data generated over the last decade has triggered
massive research and adoption of distributed big data analytics platforms. In real-time …

Tuneful: An online significance-aware configuration tuner for big data analytics

A Fekry, L Carata, T Pasquier, A Rice… - arXiv preprint arXiv …, 2020 - arxiv.org
Distributed analytics engines such as Spark are a common choice for processing extremely
large datasets. However, finding good configurations for these systems remains challenging …

Towards automatic tuning of apache spark configuration

N Nguyen, MMH Khan, K Wang - 2018 IEEE 11th International …, 2018 - ieeexplore.ieee.org
Apache Spark provides a large number of configuration settings that may be tuned to
improve the performance of specific applications running on the platform. However, it is non …

How data volume affects spark based data analytics on a scale-up server

AJ Awan, M Brorsson, V Vlassov, E Ayguade - Big Data Benchmarks …, 2016 - Springer
Sheer increase in volume of data over the last decade has triggered research in cluster
computing frameworks that enable web enterprises to extract big insights from big data …