Incoop: MapReduce for incremental computations

P Bhatotia, A Wieder, R Rodrigues, UA Acar… - Proceedings of the 2nd …, 2011 - dl.acm.org
Proceedings of the 2nd ACM Symposium on Cloud Computing, 2011dl.acm.org
Many online data sets evolve over time as new entries are slowly added and existing entries
are deleted or modified. Taking advantage of this, systems for incremental bulk data
processing, such as Google's Percolator, can achieve efficient updates. To achieve this
efficiency, however, these systems lose compatibility with the simple programming models
offered by non-incremental systems, eg, MapReduce, and more importantly, requires the
programmer to implement application-specific dynamic algorithms, ultimately increasing …
Many online data sets evolve over time as new entries are slowly added and existing entries are deleted or modified. Taking advantage of this, systems for incremental bulk data processing, such as Google's Percolator, can achieve efficient updates. To achieve this efficiency, however, these systems lose compatibility with the simple programming models offered by non-incremental systems, e.g., MapReduce, and more importantly, requires the programmer to implement application-specific dynamic algorithms, ultimately increasing algorithm and code complexity.
In this paper, we describe the architecture, implementation, and evaluation of Incoop, a generic MapReduce framework for incremental computations. Incoop detects changes to the input and automatically updates the output by employing an efficient, fine-grained result reuse mechanism. To achieve efficiency without sacrificing transparency, we adopt recent advances in the area of programming languages to identify the shortcomings of task-level memoization approaches, and to address these shortcomings by using several novel techniques: a storage system, a contraction phase for Reduce tasks, and an affinity-based scheduling algorithm. We have implemented Incoop by extending the Hadoop framework, and evaluated it by considering several applications and case studies. Our results show significant performance improvements without changing a single line of application code.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果