[PDF][PDF] DataScience4NP-A Data Science Service for Non-Programmers

B Lopes, A Pedroso, J Correia, F Araujo… - 10o Simpósio de …, 2018 - researchgate.net
10o Simpósio de Informática–INForum, 2018researchgate.net
With the emergence of Big Data, the scarcity of data scientists to analyse all the data being
produced in different domains became evident. Moreover, the processing of such amounts
of data also is challenging due to current technologies in use. With this in mind, the Data-
Science4NP aims to explore the use of visual programming paradigms to enable non-
programmers to be part of the data science workforce at a faster pace and at the same time
to provide a scalable data science service. By observing the common process employed by …
Abstract
With the emergence of Big Data, the scarcity of data scientists to analyse all the data being produced in different domains became evident. Moreover, the processing of such amounts of data also is challenging due to current technologies in use. With this in mind, the Data-Science4NP aims to explore the use of visual programming paradigms to enable non-programmers to be part of the data science workforce at a faster pace and at the same time to provide a scalable data science service. By observing the common process employed by data scientists in the extraction of knowledge from data, which includes data insertion, preprocessing, transformation, data mining and interpretation/evaluation of results, we envisioned a system to perform all these steps without requiring users to program. Thus, our solution aims to provide an intuitive user interface where users can build personalized sequential data science workflows that are consequently processed by a back-end service. The back-end service translates the received workflows to a lower-level representation, enabling the execution of the translated tasks by separate scalable and distributed data science services in parallel. The entire system is composed of different services containerized with Docker and orchestrated with Kubernetes, allowing it to be easily deployed in different clusters. To evaluate our tool, and particularly to verify if the concept we envisioned for the creation and execution of data science tasks was intuitive, we conducted preliminary usability tests with two different groups of people, where we observed a high level of user satisfaction. Concluding, from the feedback obtained, it was clear that this concept of sequential workflows would bring added value to both novice and advanced data scientists.
researchgate.net
以上显示的是最相近的搜索结果。 查看全部搜索结果