Husky: Towards a more efficient and expressive distributed computing framework

F Yang, J Li, J Cheng - Proceedings of the VLDB Endowment, 2016 - dl.acm.org
F Yang, J Li, J Cheng
Proceedings of the VLDB Endowment, 2016dl.acm.org
Finding efficient, expressive and yet intuitive programming models for data-parallel
computing system is an important and open problem. Systems like Hadoop and Spark have
been widely adopted for massive data processing, as coarse-grained primitives like map
and reduce are succinct and easy to master. However, sometimes over-simplified API
hinders programmers from more fine-grained control and designing more efficient
algorithms. Developers may have to resort to sophisticated domain-specific languages …
Finding efficient, expressive and yet intuitive programming models for data-parallel computing system is an important and open problem. Systems like Hadoop and Spark have been widely adopted for massive data processing, as coarse-grained primitives like map and reduce are succinct and easy to master. However, sometimes over-simplified API hinders programmers from more fine-grained control and designing more efficient algorithms. Developers may have to resort to sophisticated domain-specific languages (DSLs), or even low-level layers like MPI, but this raises development cost---learning many mutually exclusive systems prolongs the development schedule, and the use of low-level tools may result in bugprone programming.
This motivated us to start the Husky open-source project, which is an attempt to strike a better balance between high performance and low development cost. Husky is developed mainly for in-memory large scale data mining, and also serves as a general research platform for designing efficient distributed algorithms. We show that many existing frameworks can be easily implemented and bridged together inside Husky, and Husky is able to achieve similar or even better performance compared with domain-specific systems.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果