Spark-diy: A framework for interoperable spark operations with high performance block-based data models

S Caíno-Lores, J Carretero, B Nicolae… - 2018 IEEE/ACM 5th …, 2018 - ieeexplore.ieee.org
2018 IEEE/ACM 5th International Conference on Big Data Computing …, 2018ieeexplore.ieee.org
Today's scientific applications are increasingly relying on a variety of data sources, storage
facilities, and computing infrastructures, and there is a growing demand for data analysis
and visualization for these applications. In this context, exploiting Big Data frameworks for
scientific computing is an opportunity to incorporate high-level libraries, platforms, and
algorithms for machine learning, graph processing, and streaming; inherit their data
awareness and fault-tolerance; and increase productivity. Nevertheless, limitations exist …
Today's scientific applications are increasingly relying on a variety of data sources, storage facilities, and computing infrastructures, and there is a growing demand for data analysis and visualization for these applications. In this context, exploiting Big Data frameworks for scientific computing is an opportunity to incorporate high-level libraries, platforms, and algorithms for machine learning, graph processing, and streaming; inherit their data awareness and fault-tolerance; and increase productivity. Nevertheless, limitations exist when Big Data platforms are integrated with an HPC environment, namely poor scalability, severe memory overhead, and huge development effort. This paper focuses on a popular Big Data framework -Apache Spark- and proposes an architecture to support the integration of highly scalable MPI block-based data models and communication patterns with a map-reduce-based programming model. The resulting platform preserves the data abstraction and programming interface of Spark, without conducting any changes in the framework, but allows the user to delegate operations to the MPI layer. The evaluation of our prototype shows that our approach integrates Spark and MPI efficiently at scale, so end users can take advantage of the productivity facilitated by the rich ecosystem of high-level Big Data tools and libraries based on Spark, without compromising efficiency and scalability.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果