The case for tiny tasks in compute clusters

E Jonas, Q Pu, S Venkataraman, I Stoica… - Proceedings of the 2017 …, 2017 - dl.acm.org

Distributed computing remains inaccessible to a large number of users, in spite of many
open source platforms and extensive commercial offerings. While distributed computation …

被引用次数：620 相关文章所有 16 个版本

[PDF] academia.edu

Cluster frameworks for efficient scheduling and resource allocation in data center networks: A survey

K Wang, Q Zhou, S Guo, J Luo - IEEE Communications Surveys …, 2018 - ieeexplore.ieee.org

Data centers are widely used for big data analytics, which often involve data-parallel jobs,
including query and web service. Meanwhile, cluster frameworks are rapidly developed for …

被引用次数：71 相关文章所有 5 个版本

[PDF] acm.org

Sparrow: distributed, low latency scheduling

K Ousterhout, P Wendell, M Zaharia… - Proceedings of the twenty …, 2013 - dl.acm.org

Large-scale data analytics frameworks are shifting towards shorter task durations and larger
degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete …

被引用次数：811 相关文章所有 32 个版本

[PDF] usenix.org

Making sense of performance in data analytics frameworks

K Ousterhout, R Rasti, S Ratnasamy… - … USENIX Symposium on …, 2015 - usenix.org

There has been much research devoted to improving the performance of data analytics
frameworks, but comparatively little effort has been spent systematically identifying the …

被引用次数：546 相关文章所有 27 个版本

[PDF] usenix.org

Firmament: Fast, centralized cluster scheduling at scale

I Gog, M Schwarzkopf, A Gleave, RNM Watson… - … USENIX Symposium on …, 2016 - usenix.org

Centralized datacenter schedulers can make high-quality placement decisions when
scheduling tasks in a cluster. Today, however, high-quality placements come at the cost of …

被引用次数：289 相关文章所有 17 个版本

[PDF] arxiv.org

Shark: SQL and rich analytics at scale

RS Xin, J Rosen, M Zaharia, MJ Franklin… - Proceedings of the …, 2013 - dl.acm.org

Shark is a new data analysis system that marries query processing with complex analytics
on large clusters. It leverages a novel distributed memory abstraction to provide a unified …

被引用次数：659 相关文章所有 29 个版本

[PDF] acm.org

Drizzle: Fast and adaptable stream processing at scale

S Venkataraman, A Panda, K Ousterhout… - Proceedings of the 26th …, 2017 - dl.acm.org

Large scale streaming systems aim to provide high throughput and low latency. They are
often used to run mission-critical applications, and must be available 24x7. Thus such …

被引用次数：209 相关文章所有 18 个版本

[PDF] usenix.org

Accelerating distributed {MoE} training and inference with lina

J Li, Y Jiang, Y Zhu, C Wang, H Xu - 2023 USENIX Annual Technical …, 2023 - usenix.org

Scaling model parameters improves model quality at the price of high computation
overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) …

被引用次数：24 相关文章所有 7 个版本

[PDF] acm.org

Hopper: Decentralized speculation-aware cluster scheduling at scale

X Ren, G Ananthanarayanan, A Wierman… - Proceedings of the 2015 …, 2015 - dl.acm.org

As clusters continue to grow in size and complexity, providing scalable and predictable
performance is an increasingly important challenge. A crucial roadblock to achieving …

被引用次数：177 相关文章所有 21 个版本

[PDF] usenix.org

{GRASS}: Trimming stragglers in approximation analytics

G Ananthanarayanan, MCC Hung, X Ren… - … USENIX symposium on …, 2014 - usenix.org

In big data analytics, timely results, even if based on only part of the data, are often good
enough. For this reason, approximation jobs, which have deadline or error bounds and …

被引用次数：191 相关文章所有 20 个版本