mrMoulder: A recommendation-based adaptive parameter tuning approach for big data processing platform

L Cai, Y Qi, W Wei, J Wu, J Li - Future Generation Computer Systems, 2019 - Elsevier
Nowadays the world has entered the big data era. Big data processing platforms, such as
Hadoop and Spark, are increasingly adopted by many applications, in which there are …

Scaling spark in the real world: performance and usability

M Armbrust, T Das, A Davidson, A Ghodsi… - Proceedings of the …, 2015 - dl.acm.org
Apache Spark is one of the most widely used open source processing engines for big data,
with rich language-integrated APIs and a wide range of libraries. Over the past two years …

ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems

J Lian, X Zhang, Y Shao, Z Pu, Q Xiang, Y Li… - arXiv preprint arXiv …, 2023 - arxiv.org
The past decade has seen rapid growth of distributed stream data processing systems.
Under these systems, a stream application is realized as a Directed Acyclic Graph (DAG) of …

Flare: Native compilation for heterogeneous workloads in Apache Spark

GM Essertel, RY Tahboub, JM Decker… - arXiv preprint arXiv …, 2017 - arxiv.org
The need for modern data analytics to combine relational, procedural, and map-reduce-style
functional processing is widely recognized. State-of-the-art systems like Spark have added …

Intelligent Pooling: Proactive Resource Provisioning in Large-scale Cloud Service

D Ravikumar, A Yeo, Y Zhu, A Lakra… - Proceedings of the …, 2024 - dl.acm.org
The proliferation of big data and analytic workloads has driven the need for cloud compute
and cluster-based job processing. With Apache Spark, users can process terabytes of data …

A model driven approach towards improving the performance of apache spark applications

K Wang, MMH Khan, N Nguyen… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
Apache Spark applications often execute in multiple stages where each stage consists of
multiple tasks running in parallel. However, prior efforts noted that the execution time of …

[图书][B] Mastering Apache Spark 2. x

R Kienzler - 2017 - books.google.com
Advanced analytics on your Big Data with latest Apache Spark 2. x About This Book An
advanced guide with a combination of instructions and practical examples to extend the …

Spark-diy: A framework for interoperable spark operations with high performance block-based data models

S Caíno-Lores, J Carretero, B Nicolae… - 2018 IEEE/ACM 5th …, 2018 - ieeexplore.ieee.org
Today's scientific applications are increasingly relying on a variety of data sources, storage
facilities, and computing infrastructures, and there is a growing demand for data analysis …

Insights on apache spark usage by mining stack overflow questions

LJ Rodríguez, X Wang, J Kuang - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Apache Spark is one of the most popular big data tools. Despite its popularity, there are no
studies regarding its overall usage among software developers. As such, essential …

Optimizations of Distributed Computing Processes on Apache Spark Platform.

T Hajji, R Loukili, I El Hassani… - … International Journal of …, 2023 - search.ebscohost.com
The frequently difficult process of examining large and diverse amounts of information is
known as" big data analysis." The goal is to find insights, such as hidden patterns …