相关文章- 学术资源搜索

XDataExplorer: a three-stage comprehensive self-tuning tool for Big Data platforms

Q Guo, Y Xie, Q Li, Y Zhu - Big Data Research, 2022 - Elsevier

To meet the challenges of massive data, many big data platforms have been used in
practice. In these data processing platforms, there are many correlated parameters that have …

被引用次数：3 相关文章所有 2 个版本

An empirical study on the challenges that developers encounter when developing Apache Spark applications

Z Wang, THP Chen, H Zhang, S Wang - Journal of Systems and Software, 2022 - Elsevier

Apache Spark is one of the most popular big data frameworks that abstract the underlying
distributed computation details. However, even though Spark provides various abstractions …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Approximation with error bounds in spark

G Hu, S Rigo, D Zhang… - 2019 IEEE 27th …, 2019 - ieeexplore.ieee.org

Many decision-making queries are based on aggregating massive amounts of data, where
sampling is an important approximation technique for reducing execution times. It is …

被引用次数：12 相关文章所有 7 个版本

[PDF] ieee.org

Evolutionary scheduling of dynamic multitasking workloads for big-data analytics in elastic cloud

F Zhang, J Cao, W Tan, SU Khan, K Li… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org

Scheduling of dynamic and multitasking workloads for big-data analytics is a challenging
issue, as it requires a significant amount of parameter sweeping and iterations. Therefore …

被引用次数：77 相关文章所有 8 个版本

[引用][C] Deepspark: Spark-based deep learning supporting asynchronous updates and caffe compatibility

H Kim, J Park, J Jang, S Yoon - CoRR, vol. abs/1602.08191, 2016

被引用次数：42 相关文章

DeepCAT: A Cost-Efficient Online Configuration Auto-Tuning Approach for Big Data Frameworks

H Dou, Y Wang, Y Zhang, P Chen - Proceedings of the 51st International …, 2022 - dl.acm.org

To support different application scenarios, big data frameworks usually provide a large
number of performance-related configuration parameters. Online auto-tuning these …

被引用次数：3 相关文章

[PDF] amazonaws.com

Large scale distributed data science from scratch using Apache Spark 2.0

J Shanahan, L Dai - Proceedings of the 26th International Conference on …, 2017 - dl.acm.org

Apache Spark is an open-source cluster computing framework. It has emerged as the next
generation big data processing engine, overtaking Hadoop MapReduce which helped ignite …

被引用次数：18 相关文章所有 3 个版本

[PDF] researchgate.net

[PDF][PDF] Mapreduce/bigtable for distributed optimization

KB Hall, S Gilpin, G Mann - NIPS LCCC Workshop, 2010 - researchgate.net

With large data sets, it can be time consuming to run gradient based optimization, for
example to minimize the log-likelihood for maximum entropy models. Distributed methods …

被引用次数：68 相关文章所有 11 个版本

SWAT: A programmable, in-memory, distributed, high-performance computing platform

M Grossman, V Sarkar - Proceedings of the 25th ACM International …, 2016 - dl.acm.org

The field of data analytics is currently going through a renaissance as a result of ever-
increasing dataset sizes, the value of the models that can be trained from those datasets …

被引用次数：28 相关文章

[PDF] arxiv.org

Per-run algorithm selection with warm-starting using trajectory-based features

A Kostovska, A Jankovic, D Vermetten… - … Conference on Parallel …, 2022 - Springer

Per-instance algorithm selection seeks to recommend, for a given problem instance and a
given performance criterion, one or several suitable algorithms that are expected to perform …

被引用次数：39 相关文章所有 9 个版本