Spark parameter tuning via trial-and-error

H Herodotou, Y Chen, J Lu - ACM Computing Surveys (CSUR), 2020 - dl.acm.org

Big data processing systems (eg, Hadoop, Spark, Storm) contain a vast number of
configuration parameters controlling parallelism, I/O behavior, memory settings, and …

被引用次数：98 相关文章所有 10 个版本

[HTML] springer.com Full View

[HTML][HTML] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench

N Ahmed, ALC Barczak, T Susnjak, MA Rashid - Journal of Big Data, 2020 - Springer

Big Data analytics for storing, processing, and analyzing large-scale datasets has become
an essential tool for the industry. The advent of distributed computing frameworks such as …

被引用次数：86 相关文章所有 18 个版本

[PDF] upm.es

Using machine learning to optimize parallelism in big data applications

ÁB Hernández, MS Perez, S Gupta… - Future Generation …, 2018 - Elsevier

In-memory cluster computing platforms have gained momentum in the last years, due to their
ability to analyse big amounts of data in parallel. These platforms are complex and difficult-to …

被引用次数：98 相关文章所有 15 个版本

[PDF] upc.edu

A methodology for spark parameter tuning

A Gounaris, J Torres - Big data research, 2018 - Elsevier

Spark has been established as an attractive platform for big data analysis, since it manages
to hide most of the complexities related to parallelism, fault tolerance and cluster setting from …

被引用次数：82 相关文章所有 7 个版本

Efficient performance prediction for apache spark

G Cheng, S Ying, B Wang, Y Li - Journal of Parallel and Distributed …, 2021 - Elsevier

Spark is a more efficient distributed big data processing framework following Hadoop. It
provides users with more than 180 adjustable configuration parameters, and how to choose …

被引用次数：32 相关文章

[PDF] acm.org

Locat: Low-overhead online configuration auto-tuning of spark sql applications

J Xin, K Hwang, Z Yu - … of the 2022 International Conference on …, 2022 - dl.acm.org

Spark SQL has been widely deployed in industry but it is challenging to tune its
performance. Recent studies try to employ machine learning (ML) to solve this problem, but …

被引用次数：16 相关文章所有 4 个版本

[PDF] arxiv.org

Towards general and efficient online tuning for spark

Y Li, H Jiang, Y Shen, Y Fang, X Yang, D Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

The distributed data analytic system--Spark is a common choice for processing massive
volumes of heterogeneous data, while it is challenging to tune its parameters to achieve …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Rover: An online Spark SQL tuning service via generalized transfer learning

Y Shen, X Ren, Y Lu, H Jiang, H Xu, D Peng… - Proceedings of the 29th …, 2023 - dl.acm.org

Distributed data analytic engines like Spark are common choices to process massive data in
industry. However, the performance of Spark SQL highly depends on the choice of …

被引用次数：7 相关文章所有 3 个版本

[HTML] springer.com Full View

[HTML][HTML] Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models

N Ahmed, ALC Barczak, MA Rashid, T Susnjak - Journal of Big Data, 2022 - Springer

Due to the rapid growth of available data, various platforms offer parallel infrastructure that
efficiently processes big data. One of the critical issues is how to use these platforms to …

被引用次数：11 相关文章所有 10 个版本

[PDF] upc.edu

You only run once: spark auto-tuning from a single run

DB Prats, FA Portella, CHA Costa… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Tuning configurations of Spark jobs is not a trivial task. State-of-the-art auto-tuning systems
are based on iteratively running workloads with different configurations. During the …

被引用次数：23 相关文章所有 4 个版本