Integrating Pig with Harp to support iterative applications with fast cache and customized communication
2014 5th International Workshop on Data-Intensive Computing in the …, 2014•ieeexplore.ieee.org
Use of high-level scripting languages to solve big data problems has become a mainstream
approach for sophisticated machine learning data analysis. Often data must be used in
several steps of a computation to complete a full task. Composing default data
transformation operators with the standard Hadoop MapReduce runtime is very convenient.
However, the current strategy of using high-level languages to support iterative applications
with Hadoop MapReduce relies on an external wrapper script in other languages such as …
approach for sophisticated machine learning data analysis. Often data must be used in
several steps of a computation to complete a full task. Composing default data
transformation operators with the standard Hadoop MapReduce runtime is very convenient.
However, the current strategy of using high-level languages to support iterative applications
with Hadoop MapReduce relies on an external wrapper script in other languages such as …
Use of high-level scripting languages to solve big data problems has become a mainstream approach for sophisticated machine learning data analysis. Often data must be used in several steps of a computation to complete a full task. Composing default data transformation operators with the standard Hadoop MapReduce runtime is very convenient. However, the current strategy of using high-level languages to support iterative applications with Hadoop MapReduce relies on an external wrapper script in other languages such as Python and Groovy, which causes significant performance loss when restarting mappers and reducers between jobs. In this paper, we reduce the extra job startup overheads by integrating Apache Pig with the high-performance Hadoop plug-in Harp developed at Indiana University. This provides fast data caching and customized communication patterns among iterations for data analysis. The results show performance improvements of factors from 2 to 5.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果