[图书][B] Spark: Big data cluster computing in production

I Ganelin, E Orhian, K Sasaki, B York - 2016 - books.google.com
Production-targeted Spark guidance with real-world use cases Spark: Big Data Cluster
Computing in Production goes beyond general Spark overviews to provide targeted …

Descending through a crowded valley-benchmarking deep learning optimizers

RM Schmidt, F Schneider… - … Conference on Machine …, 2021 - proceedings.mlr.press
Choosing the optimizer is considered to be among the most crucial design decisions in deep
learning, and it is not an easy one. The growing literature now lists hundreds of optimization …

A heterogeneity-aware task scheduler for spark

L Xu, AR Butt, SH Lim, R Kannan - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Big data processing systems such as Spark are employed in an increasing number of
diverse applications-such as machine learning, graph computation, and scientific computing …

Open source vizier: Distributed infrastructure and api for reliable and flexible blackbox optimization

X Song, S Perel, C Lee, G Kochanski… - International …, 2022 - proceedings.mlr.press
Vizier is the de-facto blackbox optimization service across Google, having optimized some of
Google's largest products and research efforts. To operate at the scale of tuning thousands …

Hyper-tune: Towards efficient hyper-parameter tuning at scale

Y Li, Y Shen, H Jiang, W Zhang, J Li, J Liu… - arXiv preprint arXiv …, 2022 - arxiv.org
The ever-growing demand and complexity of machine learning are putting pressure on
hyper-parameter tuning systems: while the evaluation cost of models continues to increase …

Effective data management strategy and RDD weight cache replacement strategy in Spark

K Jiang, S Du, F Zhao, Y Huang, C Li, Y Luo - Computer Communications, 2022 - Elsevier
With the dramatic increase in internet users and their demand for real-time network
performance, Spark has distributed computing environment has emerged. It is widely used …

Optimizing shuffle in wide-area data analytics

S Liu, H Wang, B Li - 2017 IEEE 37th International Conference …, 2017 - ieeexplore.ieee.org
As increasingly large volumes of raw data are generated at geographically distributed
datacenters, they need to be efficiently processed by data analytic jobs spanning multiple …

Adaptively accelerating map-reduce/spark with GPUs: A case study

KR Jayaram, A Gandhi, H Xin… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
In this paper, we propose and evaluate a simple mechanism to accelerate iterative machine
learning algorithms implemented in Hadoop map-reduce (stock), and Apache Spark. In …

[图书][B] Data Analytics with Spark Using Python

J Aven - 2018 - books.google.com
Spark is at the heart of today's Big Data revolution, helping data professionals supercharge
efficiency and performance in a wide range of data processing and analytics tasks. In this …

SMBSP: a self-tuning approach using machine learning to improve performance of spark in big data processing

MA Rahman, J Hossen… - 2018 7th International …, 2018 - ieeexplore.ieee.org
Apache Spark, popularly known for big data processing capability, is a distributed open-
source platform that uses the concept of distributed memory to facilitate big data processing …