[PDF][PDF] Benchmarking apache spark with machine learning applications

J Wei, JK Kim, GA Gibson - Parallel Data Lab., Carnegie Mellon Univ …, 2016 - pdl.cmu.edu
Abstract We benchmarked Apache Spark with a popular parallel machine learning training
application, Distributed Stochastic Gradient Descent for Matrix Factorization [5] and …

TIE: Fast Experiment-driven ML-based Configuration Tuning for In-memory Data Analytics

C Chen, J Xin, Z Yu - IEEE Transactions on Computers, 2024 - ieeexplore.ieee.org
Recently, experiment-driven machine-learning (ML) based configuration tuning for in-
memory data analytics such as Apache Spark become popular because they can achieve …

Auto-tuning spark configurations based on neural network

J Gu, Y Li, H Tang, Z Wu - 2018 IEEE International Conference …, 2018 - ieeexplore.ieee.org
For massive data processing platforms such as Spark, configuration tuning is a necessary
step since it is closely related to task parallelism, resource allocation and fault tolerance …

[图书][B] Spark: The definitive guide: Big data processing made simple

B Chambers, M Zaharia - 2018 - books.google.com
Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide,
written by the creators of the open-source cluster-computing framework. With an emphasis …

Meteor: Optimizing spark-on-yarn for short applications

H Zhang, H Huang, L Wang - Future Generation Computer Systems, 2019 - Elsevier
Due to its speed and ease of use, Spark has become a popular tool amongst data scientists
to analyze data in various sizes. Counter-intuitively, data processing workloads in industrial …

Qaad (query-as-a-data): Scalable execution of massive number of small queries in spark

Y Park, B Tak, WS Han - Proceedings of the ACM on Management of …, 2023 - dl.acm.org
Spark big data processing platform is heavily used in today's IT services for various critical
applications such as machine learning tasks for service recommendations or massive …

Tuning configuration of apache spark on public clouds by combining multi-objective optimization and performance prediction model

G Cheng, S Ying, B Wang - Journal of Systems and Software, 2021 - Elsevier
Choosing the right configuration for Spark deployed in the public cloud to ensure the
efficient running of periodic jobs is hard, because there can be a huge configuration space …

Ps2: Parameter server on spark

Z Zhang, B Cui, Y Shao, L Yu, J Jiang… - Proceedings of the 2019 …, 2019 - dl.acm.org
Most of the data is extracted and processed by Spark in Tencent Machine Learning Platform.
However, seldom of them use Spark MLlib, an official machine learning (ML) library on top of …

[图书][B] Machine learning with spark

N Pentreath - 2015 - bzz.wallizard.com
In recent years, the volume of data being collected, stored, and analyzed has exploded, in
particular in relation to the activity on the Web and mobile devices, as well as data from the …

Speedup your analytics: Automatic parameter tuning for databases and big data systems

J Lu, Y Chen, H Herodotou, S Babu - Proceedings of the VLDB …, 2019 - dl.acm.org
Database and big data analytics systems such as Hadoop and Spark have a large number
of configuration parameters that control memory distribution, I/O optimization, parallelism …