Scaling distributed machine learning with the parameter server

M Li, DG Andersen, JW Park, AJ Smola… - … USENIX Symposium on …, 2014 - usenix.org
We propose a parameter server framework for distributed machine learning problems. Both
data and workloads are distributed over worker nodes, while the server nodes maintain …

Apache hadoop yarn: Yet another resource negotiator

VK Vavilapalli, AC Murthy, C Douglas… - Proceedings of the 4th …, 2013 - dl.acm.org
The initial design of Apache Hadoop [1] was tightly focused on running massive,
MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has …

Apache tez: A unifying framework for modeling and building data processing applications

B Saha, H Shah, S Seth, G Vijayaraghavan… - Proceedings of the …, 2015 - dl.acm.org
The broad success of Hadoop has led to a fast-evolving and diverse ecosystem of
application engines that are building upon the YARN resource management layer. The open …

Mercury: Hybrid centralized and distributed scheduling in large shared clusters

K Karanasos, S Rao, C Curino, C Douglas… - 2015 USENIX Annual …, 2015 - usenix.org
Datacenter-scale computing for analytics workloads is increasingly common. High
operational costs force heterogeneous applications to share cluster resources for achieving …

Trill: A high-performance incremental query processor for diverse analytics

B Chandramouli, J Goldstein, M Barnett… - Proceedings of the …, 2014 - dl.acm.org
This paper introduces Trill--a new query processor for analytics. Trill fulfills a combination of
three requirements for a query processor to serve the diverse big data analytics space:(1) …

Resource elasticity for large-scale machine learning

B Huang, M Boehm, Y Tian, B Reinwald… - Proceedings of the …, 2015 - dl.acm.org
Declarative large-scale machine learning (ML) aims at flexible specification of ML algorithms
and automatic generation of hybrid runtime plans ranging from single node, in-memory …

[PDF][PDF] Scaling distributed machine learning with system and algorithm co-design

M Li - Santa Clara, CA, USA: Intel, 2017 - reports-archive.adm.cs.cmu.edu
Due to the rapid growth of data and the ever increasing model complexity, which often
manifests itself in the large number of model parameters, today, many important machine …

[PDF][PDF] User behavior modeling with large-scale graph analysis

A Beutel - Computer Science Department, Carnegie …, 2016 - reports-archive.adm.cs.cmu.edu
Can we model how fraudsters work to distinguish them from normal users? Can we predict
not just which movie a person will like, but also why? How can we find when a student will …

[PDF][PDF] Dolphin: Runtime optimization for distributed machine learning

YSL Lee, M Weimer, Y Yang… - Proc. of ICML ML …, 2016 - bj2.web.engr.illinois.edu
Large-scale machine learning (ML) systems are becoming widely used. Typically, these ML
systems run on fixed resources, but it is difficult to find their optimal configurations (eg, how …

Performance evaluation of job schedulers on Hadoop YARN

JC Lin, MC Lee - Concurrency and Computation: Practice and …, 2016 - Wiley Online Library
To solve the limitation of Hadoop on scalability, resource sharing, and application support,
the open‐source community proposes the next generation of Hadoop's compute platform …